from:"Dann Corbit"

[HACKERS] remapped localhost causes connections to localhost to fail using Postgres

2015-12-04 Thread Dann Corbit

Using a Windows computer, editing the file:
C:\WINDOWS\system32\drivers\etc\hosts
the localhost entry was remapped to the machine name by adding the following 
line:
127.0.0.1   

After this change, Postgres would not allow access using the address localhost.
Only using the machine name to access the database was possible.
Is this by design?

In other words
psql -h localhost
fails, but:
psql -h 
succeeds.

[HACKERS] Re: [GENERAL] postgresql-9.3.1-1-windows-x64.exe does not install correctly for me

2013-10-31 Thread Dann Corbit

The PostgreSQL installer for Windows 64 appears to be broken for Microsoft 
Windows Server 2012 Standard.

Even after uninstalling, removing the entire postgresql directory structure, 
and running the installer as administrator, I get this error:

fixing permissions on existing directory C:/Program Files/PostgreSQL/9.3/data 
... initdb: could not change permissions of directory C:/Program 
Files/PostgreSQL/9.3/data: Permission denied

I was able to get it to install by giving permissions to the postgresql folder 
to applications and services, but I am not really sure what allowed it to 
complete.

However, that sort of installation would be unsafe for regular users.

I only use it for testing, so it does not really matter much in my case (though 
I would like it much better if I did not have to munge around to make the 
install work).

-Original Message-
From: Adrian Klaver [mailto:adrian.kla...@gmail.com] 
Sent: Thursday, October 31, 2013 5:03 PM
To: Dann Corbit; 'pgsql-gene...@postgresql.org'
Subject: Re: [GENERAL] postgresql-9.3.1-1-windows-x64.exe does not install 
correctly for me

On 10/31/2013 11:53 AM, Dann Corbit wrote:
 postgresql-9.3.1-1-windows-x64.exe

 Problem running post-install step.  Installation may not complete 
 correctly

 The database cluster initialization failed.

 Is there an installation log I can examine to determine the problem 
 more completely?


http://www.enterprisedb.com/resources-community/pginst-guide#troubleshooting


--
Adrian Klaver
adrian.kla...@gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New regression test time

2013-06-29 Thread Dann Corbit

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Josh Berkus
Sent: Saturday, June 29, 2013 3:00 PM
To: Andrew Dunstan
Cc: Alvaro Herrera; pgsql-hackers@postgresql.org; Robins Tharakan
Subject: Re: [HACKERS] New regression test time

On 06/29/2013 02:14 PM, Andrew Dunstan wrote:
 AIUI: They do test feature use and errors that have cropped up in the 
 past that we need to beware of. They don't test every bug we've ever 
 had, nor do they exercise every piece of code.

If we don't have a test for it, then we can break it in the future and not know 
we've broken it until .0 is released.  Is that really a direction we're happy 
going in?

 Maybe there is a good case for these last two in a different set of tests.

If we had a different set of tests, that would be a valid argument.  But we 
don't, so it's not.  And nobody has offered to write a feature to split our 
tests either.

I have to say, I'm really surprised at the level of resistance people on this 
list are showing to the idea of increasing test coverage. I thought
that Postgres was all about reliability?   For a project as mature as we
are, our test coverage is abysmal, and I think I'm starting to see why.

An ounce of prevention is worth a pound of cure.
The cost of a bug rises exponentially, starting at requirements gathering and 
ending at the customer.
Where I work, we have two computer  rooms full of machines that run tests 
around the clock, 24x365.
Even so, a full regression takes well over a week because we perform hundreds 
of thousands of tests.
All choices of this kind are trade-offs.  But in such situations, my motto is:
Do whatever will make the customer prosper the most.
IMO-YMMV

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ancient sequence point bug

2013-04-16 Thread Dann Corbit

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Tuesday, April 16, 2013 7:52 PM
To: Peter Eisentraut
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] ancient sequence point bug

Peter Eisentraut pete...@gmx.net writes:
 This code in bootstrap.c contains a sequence point violation (or 
 whatever that is really called):

 while ((tup = heap_getnext(scan, ForwardScanDirection)) != NULL)
 {
 (*app)-am_oid = HeapTupleGetOid(tup);
 memmove((char *) (*app++)-am_typ,
 (char *) GETSTRUCT(tup),
 sizeof((*app)-am_typ));
 }

What exactly is the violation?  sizeof() is a side-effect-free compile time 
constant, and the first value to be passed to memmove seems perfectly well 
defined.

Unless it is C99 and the object is a VLA, which must be evaluated at runtime.  
I guess that (*app)-am_typ is not a VLA, especially since PostgreSQL does not 
require C99 to compile.

I grant that this is not terribly good coding style, but I don't believe 
there's an actual risk here.

I agree that there is no risk in the sizeof operator.
According to my understanding, even this is legal:

unsigned long long *p = 0;
size_t stupid = sizeof (*p);
   printf(size of a long long is %u\n, (unsigned) stupid);

But I also guess that most compilers will have a cow when scanning it.

 In commit 1aebc361, another place in the same file was fixed like this:

Well, I don't really remember why I did that twelve years ago, but seeing that 
there are a number of purely-cosmetic changes in that commit, I'm thinking it 
was only meant to be cosmetic.

I certainly have no objection to making this code look more like that code, 
I'm just not seeing that it's a bug.

You are right. It's not a bug.  However, I would not be surprised if GCC or 
CLANG would shriek at the higher warning levels, usually not being able to 
apply artificial intelligence to solve problems that seem simple to humans.
In the interest of quiet compiles equivalent code that does not produce a 
warning seems like a good idea.  IMO-YMMV.

   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-09 Thread Dann Corbit

Original Message-
From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark
Sent: Friday, March 08, 2013 4:59 PM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort 
variants?

On Fri, Mar 8, 2013 at 7:43 PM, Dann Corbit dcor...@connx.com wrote:
 Checking for pre-sorted input will not make the routine faster on average.
 However, it prevents a common perverse case where the runtime goes 
 quadratic, so sorting 10^6 elements will take K*10^12th operations when the 
 bad condition does occur.
 Checking for pre-sorted data can be thought of as an insurance policy.  It's 
 kind of a pain to pay the premiums, but you sure are glad you have it when 
 an accident occurs.
 Because the check is linear, and the algorithm is O(n*log(n)), the cost is 
 not terribly significant.

Well pre-sorted inputs are not the only quadratic case. If we really wanted to 
eliminate quadratic cases we could implement the pivot choice algorithm that 
guarantees n*log(n) worst-case. 

There are three things that give qsort fits:
1.  Large ascending partitions.
2.  Large descending partitions.
3.  Bad choice of the median.
If you examine the link in the previous article I gave:
http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html
It has a subfolder located by following this link:
http://www.cs.toronto.edu/~zhouqq/postgresql/sort/dann/
In that folder is a file called qsortpdq.c
In file qsortpdq.c is a function:
void qsortPDQ(void *a, size_t n, size_t es, int (*cmp) (const void *, const 
void *));
which is an interface that checks for ascending partitions, descending 
partitions, and does a median of 3 choice for the pivot.
Now, even this routine will have very rare inputs that can cause it to go 
quadratic.  The probability of one of these particular inputs is very low.

If you want 100% guarantee against quadratic behavior, then qsortpdq can be 
wrapped in heapsort with a recursion depth check.  That sort would never go 
quadratic.  That method is called introspective sort.
Here are some links that describe introspective sort:
http://en.wikipedia.org/wiki/Introsort
https://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/heapsort/pages/introspect.html
And this is the real thing, right from the horse's mouth:
http://www.cs.rpi.edu/~musser/gp/introsort.ps

There is no such thing as a quicksort that never goes quadratic.  It was 
formally proven.  I cannot find the proof that I once read, but the article A 
Killer Adversary for Quicksort by M. D. MCILROY answers pretty nearly.

Here is the summary from that paper:

SUMMARY
Quicksort can be made to go quadratic by constructing input on the fly in 
response to the sequence of
items compared. The technique is illustrated by a specific adversary for the 
standard C qsort function.
The generalmethod works against any implementation of quicksort-even a 
randomizing one-that satisfies
certain very mild and realistic assumptions.

But that will increase the average run-time. If we're not doing that then I 
think your whole argument falls apart. We do care about the average case as 
well as the worst-case.
The makefile in folder:
http://www.cs.toronto.edu/~zhouqq/postgresql/sort/dann/
will build the sort system and test it.
The makefile in folder:
http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html
will build the simpler sort system and test it.

I would suggest, in addition, the use of this routine to create a tough data 
set for robustness:
http://www.cs.dartmouth.edu/~doug/aqsort.c

There's been a *ton* of research on sorting. I find it hard to believe there 
isn't a pretty solid consensus on which of these defense mechanisms is the 
best trade-off.
I'm partial to chapter 13 of C Unleashed in that regard.  As far as the 
trade-offs to, the problem is that they really are trade-offs.  However, the 
cost is quite small (benchmark and see) and the risk of not performing the 
checks is enormous.  Mostly sorted data happens all the time in real life, as 
do other perverse distributions that cause problems for sorting algorithms.to 
believe there isn't a pretty solid consensus on which of these defense 
mechanisms is the best trade-off.

My own personal recommendation would be to include every single safeguard (all 
three data checks and a recursion depth check as well).
That's what we do here (I work at a database tools company).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-09 Thread Dann Corbit

-Original Message-
From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark
Sent: Saturday, March 09, 2013 11:39 AM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort 
variants?

On Sat, Mar 9, 2013 at 10:32 AM, Dann Corbit dcor...@connx.com wrote:
 There is no such thing as a quicksort that never goes quadratic.  It was 
 formally proven

The median of medians selection of the pivot gives you O(n*log(n)).

No.  It does make O(n*n)  far less probable, but it does not eliminate it.  If 
it were possible, then introspective sort would be totally without purpose. 
 And yet introspective sort is the algorithm of choice for the ANSI/ISO C++ 
standard directly because it will never go quadratic.
Bentley and Mcilroy's  Engineering a Sort Function says this about their 
final qsort source code model chosen for their implementation in the footnote:
Of course, quadratic behavior is still possible. One can generate fiendish 
inputs by bugging Quicksort: Consider
key values to be unknown initially. In the code for selecting a partition 
element, assign values in increasing
order as unknown keys are encountered. In the partitioning code, make unknown 
keys compare high.

Interestingly, some standard library qsort routines will go quadratic if simply 
given a set that contains random 0 and 1 values for the input set.

It has been formally proven that no matter what method used to choose the 
median (other than calculating the exact median of every partition which would 
make quicksort slower than heapsort) there exists an adversary distribution 
that makes quicksort go quadratic. (unfortunately, I can't find the proof or I 
would give you the citation).  Median selection that is randomized does not 
select the true median unless it operates over all the elements of the entire 
partition.  With some small probability, it makes the worst possible choice for 
the median.  It also does not matter if you choose the first, last and middle 
elements for a median of three (or other evenly selected points for larger 
sample sets) or if you select the sample with randomization.  Both methods have 
adversaries.  Now, the quickselect algorithm can select the median in O(n) but 
that is again an average.  There are also adversary distributions for 
quickselect.

On the other hand, various safeguards can make O(n*n) extremely improbable 
{even probabilities approaching zero}.  There is a tradeoff with more and more 
complicated safeguards.  If I choose large subsets of a partition and calculate 
the true median of the subset, it will make the routine safer from quadratic 
behavior but it will also make it slower.  At some point, you end up with a 
routine no faster than heapsort, while also being much more complex and 
therefore more difficult to maintain.  Hence, the necessity of introspective 
sort.

On the other hand, there are also data specific sorts that have very special 
properties.  There are cache conscious versions of burstsort that can sort 
strings much faster than quicksort or even radix sort according to 
measurements.  There is a special version of LSD radix sort that will sort an 
array in one counting pass and N distribution passes, where N is the radix.  
So, for instance, if you are sorting 4 byte integers and your radix is 256 {one 
byte} then you can sort in 5 passes.  One pass to count the bytes in each 
integer by position, and 4 passes to distribute the data.  This can also be 
done in place.  MSD radix sort can also be used as an outer sort container 
which sorts by distribution until the partitions are small enough and then 
switches to introspective sort (which eventually switches to insertion sort).  
Radix sorts can also be made totally generic via callback routines like 
quicksort.  Instead of a comparison operator, the callback gets request for the 
radix bits for a specific bin number and then returns the radix count bits for 
that specific bin.  For unsigned char and radix 256, this is trivially 
character [i] from the string for bin [i], for example.

I guess improving sort routines all boils down to how much energy you want to 
put into it.  Donald Knuth once stated that according to measurement, about 40% 
of business related mainframe computer cycles are involved in ordering data.  
So it is clearly an important problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-09 Thread Dann Corbit

A Machine-Checked Proof of the Average-Case Complexity of Quicksort in Coq
By Eelis van der Weegen and James McKinna
Institute for Computing and Information Sciences
Radboud University Nijmegen
Heijendaalseweg 135, 6525 AJ Nijmegen, The Netherlands

Contains a formal proof, validated by machine logic, that quicksort is 
quadratic in the worst case and also a formal proof of the average runtime 
efficiency.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-09 Thread Dann Corbit

Yes, you are right.  I knew of a median of medians technique for pivot 
selection and I mistook that for the median of medians median selection 
algorithm (which it definitely isn't).
I was not aware of a true linear time selection of the median algorithm {which 
is what median of medians accomplishes).  The fastest median selection 
algorithm that I was aware of was quickselect, which is only linear on average.
I think that you analysis is correct, at any rate.

I also think I will enjoy learning and experimenting with the median of medians 
algorithm.
I found a link about it here:
http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

-Original Message-
From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark
Sent: Saturday, March 09, 2013 1:21 PM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort 
variants?

On Sat, Mar 9, 2013 at 8:52 PM, Dann Corbit dcor...@connx.com wrote:
 Median of medians selection of the pivot gives you O(n*log(n)).

 No.  It does make O(n*n)  far less probable, but it does not eliminate it.  
 If it were possible, then introspective sort would be totally without purpose.

No really, quicksort with median of medians pivot selection is most definitely 
O(n*log(n)) worst case. This is textbook stuff. In fact even the introspective 
sort paper mentions it as one of the options to fail over to if the partition 
size isn't decreasing rapidly enough.

The problem is that median of medians is O(n) rather than O(1). That doesn't 
change the O() growth rate since there will be log(n) iterations. But it means 
it contributes to the constant factor and the end result ends up being a 
constant factor larger than heap sort or merge sort. That also explains how 
your reference on the quicksort adversary doesn't apply. It works by ordering 
elements you haven't compared yet and assumes that all but O(1) elements will 
still be eligible for reordering.

In any case I think you're basically right. What we have is basically a broken 
introspective sort that does more work than necessary and then handles fewer 
cases than it should. Implementing a better introspection that detects all 
perverse cases and does so with a lower overhead than the current check is a 
fine idea.


--
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-09 Thread Dann Corbit

-Original Message-
From: gsst...@gmail.com [mailto:gsst...@gmail.com] On Behalf Of Greg Stark
Sent: Saturday, March 09, 2013 5:16 PM
To: Dann Corbit
Cc: Bruce Momjian; Peter Geoghegan; Robert Haas; Tom Lane; PG Hackers
Subject: Re: Why do we still perform a check for pre-sorted input within qsort 
variants?

On Sat, Mar 9, 2013 at 10:22 PM, Dann Corbit dcor...@connx.com wrote:
 Yes, you are right.  I knew of a median of medians technique for pivot 
 selection and I mistook that for the median of medians median selection 
 algorithm (which it definitely isn't).
 I was not aware of a true linear time selection of the median algorithm 
 {which is what median of medians accomplishes).  The fastest median selection 
 algorithm that I was aware of was quickselect, which is only linear on 
 average.
 I think that you analysis is correct, at any rate.

Hm, I was using the terminology differently than the Wikipedia page. I was 
referring to the recursive median of 5s used as the pivot selection as median 
of medians. And I still called Quicksort or Quickselect using that pivot 
Quicksort or Quickselect with that specific pivot choice algorithm.

When using that pivot choice Quicksort is O(n*log(n)) and Quickselect (Median 
of Medians on Wikipedia) is O(n). But the constant factor becomes larger than 
if the pivot choice algorithm is O(1). I suppose it's more interesting in the 
case of Quickselect since there's no other alternative algorithms that could be 
used that have better constant factors whereas for sorting we have other 
options.

I wonder if it makes sense to use timsort as the fallback for quicksort if the 
partition sizes are skewed. Timsort is specifically designed to handle 
presorted inputs well.  On the other hand it is itself a hybrid sort so it 
might be getting overly complex to make it part of a hybrid algorithm.

--

My opinion (and it is no better than anyone else's) is that for a database you 
have to be very defensive in programming.
Since database systems can contain any possible distribution (and over time 
they likely will encounter almost every possibility) it is important to prevent 
unusual inputs from causing disaster.

The reason we added introspection to the sort here is that we already had 
quicksort with ordered partition check along with a median of three sample from 
three medians of three (not to mention the standard recursion of the smallest 
partition first and switch to insertion at small enough partition size).  Sure 
enough, there was a customer who had a data distribution that caused bad 
behavior.

We have customers with mainframe data where a single table can be 24 GB (I know 
this offhand, there are sure to be some that are larger), so a bad behavior on 
a sort *will* be noticed.

Sorry about the oddball quoting.  I will look at fixing it so that my posts are 
not so hard to grok.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-08 Thread Dann Corbit

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Bruce Momjian
Sent: Friday, March 08, 2013 11:22 AM
To: Peter Geoghegan
Cc: Robert Haas; Tom Lane; PG Hackers
Subject: Re: [HACKERS] Why do we still perform a check for pre-sorted input 
within qsort variants?

On Mon, Feb 25, 2013 at 02:31:21PM +, Peter Geoghegan wrote:
 On 25 February 2013 11:49, Robert Haas robertmh...@gmail.com wrote:
  I did attempt to do some tinkering with this while I was playing 
  with it, but I didn't come up with anything really compelling.  You 
  can reduce the number of comparisons on particular workloads by 
  tinkering with the algorithm, but then somebody else ends up doing 
  more comparisons, so it's hard to say whether you've really made 
  things better.  Or at least I found it so.

 Right.

 To be honest, the real reason that it bothers me is that everything 
 else that our qsort routine does that differs from classic quicksort 
 (mostly quadratic insurance, like the median-of-medians pivot 
 selection, but also the fallback to insertion sort when n  7) is very 
 well supported by peer reviewed research. Like Tom, I find it 
 implausible that Sedgewick and others missed a trick, where we did 
 not, particularly with something so simple.

Perhaps we are more likely to be fed sorted data than a typical qsort usecase.

Checking for pre-sorted input will not make the routine faster on average.
However, it prevents a common perverse case where the runtime goes quadratic, 
so sorting 10^6 elements will take K*10^12th operations when the bad condition 
does occur.
Checking for pre-sorted data can be thought of as an insurance policy.  It's 
kind of a pain to pay the premiums, but you sure are glad you have it when an 
accident occurs.
Because the check is linear, and the algorithm is O(n*log(n)), the cost is not 
terribly significant.

Switching to insertion sort for small partitions is almost always a good idea.  
This idea was invented by Richard Singleton as ACM algorithm 347.

A different sort of safeguard, as compared to checking for pre-ordered data, is 
to watch recursion depth.  If we find that we have already gone past a depth of 
log2(n) levels and we are not ready to shift to insertion sort, then there is 
some sort of perverse data set.  A traditional fix is to switch to heap sort at 
this point.  Since heap sort is also O(n*log(n)) {though the constant 
multiplier is larger than for quicksort on average} you never get O(n*n) 
behavior.  This method is called introspective sort.

There are, of course, other clever things that can be tried.  The relaxed weak 
heapsort of Edelkamp and Steigler, for instance, or using tries as in 
Cache-Efficient String Sorting Using Copying by Ranjan Sinha.  There is also 
possibly significant benefit to using radix sort for fixed width data that can 
be easily binned.

I seem to recall that a year or two back some study was done on quicksort 
methodology as used in PostgreSQL.  As I recall, the algorithm used in 
PostgreSQL fared well in the tests.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why do we still perform a check for pre-sorted input within qsort variants?

2013-03-08 Thread Dann Corbit

-Original Message-
From: Peter Geoghegan [mailto:peter.geoghega...@gmail.com] 
Sent: Friday, March 08, 2013 12:00 PM
To: Bruce Momjian
Cc: Dann Corbit; Robert Haas; Tom Lane; PG Hackers
Subject: Re: [HACKERS] Why do we still perform a check for pre-sorted input 
within qsort variants?

On 8 March 2013 11:48, Bruce Momjian br...@momjian.us wrote:
 On Fri, Mar 8, 2013 at 07:43:10PM +, Dann Corbit wrote:
 I seem to recall that a year or two back some study was done on 
 quicksort methodology as used in PostgreSQL.  As I recall, the 
 algorithm used in PostgreSQL fared well in the tests.

 Well, that's good to hear.

I wouldn't mind taking a look at that. We're not the only ones that use (more 
or less) that same algorithm. I noticed that Redis does so too, just for 
example (though they didn't get wise to the problems with the swap_cnt 
pessimisation).

Turns out, it was a lot longer ago than I thought (2005!).  The thread was 
called Which qsort is used, but I also seem to recall that the topic was 
revisited a few times.  See, for instance:

http://www.postgresql.org/message-id/pine.lnx.4.58.0512121138080.18...@eon.cs
http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [help] Is it possible to support remote COPY operation on PG?

2012-10-24 Thread Dann Corbit

You can use the libpq API:
http://www.postgresql.org/docs/9.2/interactive/libpq-copy.html

The Postgresql JDBC driver exposes COPY, IIRC.

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Xiong He
Sent: Tuesday, October 23, 2012 11:55 PM
To: PostgreSQL-development
Subject: [HACKERS] [help] Is it possible to support remote COPY operation on PG?

Is it possible to copy some table data from remote client to the PG database 
server directly without upload the data file to the server side in advance?
---
ThanksRegards,
Xiong He

Re: [HACKERS] CUDA Sorting

2012-02-15 Thread Dann Corbit

-Original Message-
From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Gaetano Mendola
Sent: Wednesday, February 15, 2012 2:54 PM
To: Peter Geoghegan; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] CUDA Sorting

On 15/02/2012 23:11, Peter Geoghegan wrote:
 On 15 February 2012 20:00, Gaetano Mendolamend...@gmail.com  wrote:
 On 13/02/2012 19:48, Greg Stark wrote:

 I don't think we should be looking at either CUDA or OpenCL directly.
 We should be looking for a generic library that can target either 
 and is well maintained and actively developed. Any GPU code we write 
 ourselves would rapidly be overtaken by changes in the hardware and 
 innovations in parallel algorithms. If we find a library that 
 provides a sorting api and adapt our code to use it then we'll get 
 the benefits of any new hardware feature as the library adds support for 
 them.

 I think one option is to make the sort function pluggable with a 
 shared library/dll. I see several benefits from this:

   - It could be in the interest of the hardware vendor to provide the 
 most powerful sort implementation (I'm sure for example that TBB sort 
 implementation is faster that pg_sort)

   - It can permit people to play with it without being deep 
 involved in pg development and stuffs.

 Sorry, but I find it really hard to believe that the non-availability 
 of pluggable sorting is what's holding people back here. Some vanguard 
 needs to go and prove the idea by building a rough prototype before we 
 can even really comment on what an API should look like. For example, 
 I am given to understand that GPUs generally sort using radix sort - 
 resolving the impedance mismatch that prevents someone from using a 
 non-comparison based sort sure sounds like a lot of work for an 
 entirely speculative reward.

AFAIK thrust library uses the radix sort if the keys you are sorting are POD 
data comparable with a  operator otherwise it does the comparison based sort 
using the operator provided.

http://docs.thrust.googlecode.com/hg/modules.html

I'm not saying that the non-availability of pluggable sort completely holds 
people back, I'm saying that it will simplify the process now and int the 
future, of course that's my opinion.

 Someone who cannot understand tuplesort, which is not all that 
 complicated, has no business trying to build GPU sorting into 
 Postgres.

That sounds a bit harsh. I'm one of those indeed, I haven't look in the details 
not having enough time for it. At work we do GPU computing (not the sort type 
stuff) and given the fact I'm a Postgres enthusiast I asked my self: my server 
is able to sort around 500 milions integer per seconds, if postgres was able to 
do that as well it would be very nice.

What I have to say? Sorry for my thoughts.

 I had a patch committed a few hours ago that almost included the 
 capability of assigning an alternative sorting function, but only one 
 with the exact same signature as my variant of qsort_arg. pg_qsort 
 isn't used to sort tuples at all, by the way.

Then I did look in the wrong direction. Thank you for point that out.

 Threading building blocks is not going to form the basis of any novel 
 sorting implementation, because comparators in general are not thread 
 safe, and it isn't available on all the platforms we support, and 
 because of how longjmp interacts with C++ stack unwinding and so on 
 and so on. Now, you could introduce some kind of parallelism into 
 sorting integers and floats, but that's an awful lot of work for a 
 marginal reward.

The TBB was just example that did come in my mind.
What do you mean with you could introduce some kind of parallelism?
As far as I know any algorithm using the divide and conquer can be parallelized.

Radix sorting can be used for any data type, if you create a callback that 
provides the most significant bits in width buckets.  At any rate, I can't 
imagine why anyone would want to complain about sorting 40 times faster than 
before, considering the amount of time database spend in ordering data.

I have a Cuda card in this machine (NVIDIA GeForce GTX 460) and I would not 
mind it a bit if my database ORDER BY clause suddenly started running ten 
times faster than before when I am dealing with a huge volume of data.

There have been other experiments along these lines such as:
GPU-based Sorting in PostgreSQL Naju Mancheril, School of Computer Science - 
Carnegie Mellon University
www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf (This is for 
SQLite, but the grammar of SQLite is almost a pure subset of PostgreSQL, 
including things like vacuum...)
http://wiki.postgresql.org/images/6/65/Pgopencl.pdf
http://dl.acm.org/citation.cfm?id=1807207
http://www.scribd.com/doc/51484335/PostgreSQL-OpenCL-Procedural-Language-pgEast-March-2011

See also
http://highscalability.com/scaling-postgresql-using-cuda

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] What Would You Like To Do?

2011-09-13 Thread Dann Corbit

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Nolan
Sent: Tuesday, September 13, 2011 11:51 AM
To: Joshua D. Drake
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] What Would You Like To Do?

On Tue, Sep 13, 2011 at 12:26 PM, Joshua D. Drake 
j...@commandprompt.commailto:j...@commandprompt.com wrote:

On 09/13/2011 10:13 AM, Michael Nolan wrote:
The lists all seem to be focusing on the things that the developers
would like to add to PostgreSQL, what about some things that users or
ISPs might like to have, and thus perhaps something that companies might
actually see as worth funding?

Well just my own two cents ... but it all depends on who is doing the funding. 
At this point 80% of the work CMD codes for Pg (or tertiary projects and 
modules) is funded by companies. So let's not assume that companies aren't 
funding things. They are.

But perhaps if a few 'commercial' features were on the wish list there would be 
more companies willing to fund development?   The developers get a bit of what 
they want to work on, the production users get a bit of what they need, 
everybody's happy.

For example:

A fully integrated ability to query across multiple databases,possibly
on multiple servers, something Oracle has had for nearly two decades.

That isn't the approach to take. The fact that Oracle has it is not a guarantee 
that it is useful or good. If you need to query across databases (assuming 
within the same cluster) then you designed your database wrong and should have 
used our SCHEMA support (what Oracle calls Namespaces) instead.

This is the difference between developers and real world users.  Real world 
users may not have the ability, time or resources to redesign their databases 
just because that's the 'best' way to do something.  Will it be the most 
efficient way to do it?  Almost certainly not.

I've been involved in a few corporate mergers, and there was a short term need 
to do queries on the combined databases while the tiger team handling the IT 
restructuring figured out how (or whether) to merge the dabases together.  (One 
of these happened to be an Oracle/Oracle situation, it was a piece of cake even 
though the two data centers were 750 miles apart and the table structures had 
almost nothing in common.  Another was a two week headache, the third was even 
worse!)

In a perfect world, it would be nice if one could do combined queries linking a 
PostgreSQL database with an Oracle one, or a MySQL one, too.  Because 
sometimes, that's what you gotta do.  Even something that is several hundred 
times slower is going to be faster than merging the databases together.  When I 
do this today, I have to write a program (in perl or php) that accesses both 
databases and merges it by hand.

Microsoft uses Linked servers.
DB/2 uses DB/2 Connect
Informix uses Informix Connect
Etc.

At CONNX, our product suite provides this ability generically from any data 
source collection.  It is obvious why such a thing is utterly mandatory for 
every large business.  For example:
The business purchases a CRM system for customer relationship management like 
SAP.
The business purchases a HCM system for Human Capital Management like 
Peoplesoft.
The business purchases a Manufacturing system like MAXIM for their 
manufacturing systems.
Etc., etc., etc.

Some of these systems may have the same database type, but it is highly 
unlikely that every solution to a business problem in the entire organization 
uses the same underlying database.
People buy or build software systems to solve their business problems.  There 
is a low probability that each and every business problem was solved by the 
same sets of tools from the same vendors.
Therefore, the ability to process queries across heterogeneous systems is a 
fundamental business need.

The larger the company the more database systems you will find.  But even 
teeny-tiny organizations tend to have several different database systems needed 
to run their business.

{snip}

[HACKERS] What was the exact version of PostgreSQL where the column name length changed from 31 to 63 characters?

2011-04-26 Thread Dann Corbit

I need to know so that I can handle cases like:

Create table foolongcols(
nevermindthefurthermorejustpleadinselfdefense char(5), 
nevermindthefurthermorejustpleadguilty char(5)
);

I assume that other object names (table name, function name, etc.) are 
similarly affected.  Is that correct?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] What was the exact version of PostgreSQL where the column name length changed from 31 to 63 characters?

2011-04-26 Thread Dann Corbit

 -Original Message-
 From: Jeff Davis [mailto:pg...@j-davis.com]
 Sent: Tuesday, April 26, 2011 11:44 AM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] What was the exact version of PostgreSQL where
 the column name length changed from 31 to 63 characters?
 
 On Tue, 2011-04-26 at 18:35 +, Dann Corbit wrote:
  I need to know so that I can handle cases like:
 
  Create table foolongcols(
  nevermindthefurthermorejustpleadinselfdefense char(5),
  nevermindthefurthermorejustpleadguilty char(5)
  );
 
  I assume that other object names (table name, function name, etc.)
 are similarly affected.  Is that correct?
 
 It was changed in this commit:
 
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=46bb23
 ac016714065711cf2a780e080c7310d66e
 
 which was first released in 7.3.0, as far as I can tell.

Thanks (and also to Merlin Moncure) for finding this information.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ORDER BY 1 COLLATE

2011-04-18 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Andrew Dunstan
 Sent: Monday, April 18, 2011 1:43 PM
 To: Tom Lane
 Cc: Peter Eisentraut; pgsql-hackers
 Subject: Re: [HACKERS] ORDER BY 1 COLLATE

 On 04/18/2011 04:20 PM, Tom Lane wrote:
  Peter Eisentrautpete...@gmx.net  writes:
  This came from a review by Noah Misch a great while ago:
  test=  SELECT b FROM foo ORDER BY 1 COLLATE C;
  ERROR:  42804: collations are not supported by type integer
  According to SQL92, this should be supported.  Do we want to bother?
 It
  doesn't look hard to fix, so it's really only a question of whether
 this
  would be useful, or its absence would be too confusing.
  The ORDER BY 1 business seems to me to be legacy anyway.  I'm not
  inclined to put in even more hacks to make strange combinations work
  there --- I think we're likely to find ourselves painted into a
 corner
  someday as it is.

 It's likely to be used by SQL generators if nothing else, and I've been
 known to use it as a very convenient shorthand. It would seem to me
 like
 quite a strange inconsistency to allow order by n with some qualifiers
 but not others.

I use order by result_set_column_number a lot, especially when 
result_set_column is a complicated expression.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Postgres 9.1 - Release Theme

2011-04-01 Thread Dann Corbit

Smells like April first to me.
http://en.wikipedia.org/wiki/April_Fools'_Day

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Rajasekhar Yakkali
Sent: Friday, April 01, 2011 10:08 AM
To: dp...@postgresql.org
Cc: pgsql-gene...@postgresql.org; pgsql-hackers@postgresql.org
Subject: Re: [GENERAL] Postgres 9.1 - Release Theme

Following a great deal of discussion, I'm pleased to announce that the
PostgreSQL Core team has decided that the major theme for the 9.1
release, due in 2011, will be 'NoSQL'.

... the intention is to remove SQL support from
Postgres, and replace it with a language called 'QUEL'. This will
provide us with the flexibility we need to implement the features of
modern NoSQL databases. With no SQL support there will obviously be
some differences in the query syntax that must be used to access your
data. 

hmm..  shock it is this shift for 9.1 due in mid 2011 is unexpectedly soon 
:)

Curious to understand as to

- how this relates to every feature that is provide at the moment based on 
RDBMS paradigm.

ACID compliance, support for the features provided by SQL,  referential 
integrity, joins, caching etc, ..

-  Also does this shift take into an assumption that all the use cases fit the 
likes of data access patterns  usecases similar to facebook/twitter?  or to 
address the the likes of those ?

Thanks,
Raj

Re: [HACKERS] would hw acceleration help postgres (databases in general) ?

2010-12-10 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Hamza Bin Sohail
 Sent: Friday, December 10, 2010 3:10 PM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] would hw acceleration help postgres (databases in
 general) ?

 Hello hackers,

 I think i'm at the right place to ask this question.

 Based on your experience and the fact that you have written the
 Postgres code,
 can you tell what a rough break-down - in your opinion - is for the
 time the
 database spends time just fetching and writing  stuff to memory and
 the
 actual computation. The reason i ask this is because off-late there has
 been a
 push to put reconfigurable hardware on processor cores. What this means
 is that
 database writers can possibly identify the compute-intensive portions
 of the
 code and write hardware accelerators and/or custom instructions and
 offload
 computation to these hardware accelerators which they would have
 programmed
 onto the FPGA.

 There is not much utility  in doing this if there aren't considerable
 compute-
 intensive operations in the database (which i would be surprise if true
 ). I
 would suspect joins, complex queries etc may be very compute-intensive.
 Please
 correct me if i'm wrong. Moreover, if you were told that you have a
 reconfigurable hardware which can perform pretty complex computations
 10x
 faster than the base, would you think about synthesizing it directly on
 an fpga
 and use it ?

 I'd be more than glad to hear your guesstimates.

Here is a sample project:
http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf
And another:
http://www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/ngm/15-823/project/Final.pdf

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: First step towards Intelligent,integrated database

2010-12-01 Thread Dann Corbit

I am probably just being thick, but how is your idea different from create 
domain:
http://www.postgresql.org/docs/current/static/sql-createdomain.html

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of ghatpa...@vsnl.net
Sent: Wednesday, December 01, 2010 2:20 AM
To: pgsql hackers
Subject: [HACKERS] Proposal: First step towards Intelligent,integrated database
Importance: High

Hello,

Here is the proposal:  My 1st step towards Intelligent, Integrated database. I 
am explaining the proposal with the use of example.
Example: We will have a master table say CustMast and a transaction table say 
salesOrder table.
View of CustMast:
CustCodeNumber(5),
CustName  Varchar(30),
CustAdrsLine1   Varchar,
CustAdrsLine2   varchar  etc.
View of SalesOrder:
SordnoNumber(8),
Sorddt date,
CustCode  Number(5)  -  present way of defining.
Proposed way is:
CustCode Object CustMast. --- New data type to be introduced called O 
Object and create table definition to be modified whenever data type is 'O', it 
will accept object name (in this case table name). Here I want to inform data 
definition that field and its data type is already defined in master table and 
use the same data type here and both tables are linked with this field.
We will be using same field name in both tables if not along with table name 
field name is to be accepted in create table definition.
Advantages:

1.   Now database knows that custcode in salesorder is a foreign key, a 
table constraint can be created. It also knows that index to be created on this 
field.

2.   In present situation select statement for selecting values from join 
of both tables will be

Select sordno, sorddt, custcode, custname, custadrsline1

from salesorder, custmast

where salesorder.custcode=custmast.custcode.

3.   In proposed way we can write this statement as:

Select sordno, sorddt, custcode, custname, custadrsline1

from salesorder  (with proper changes in program which pickup values from 
select statement.

4.   Field can be from another table in same database or from Excel sheet 
column.

5.   Views need not be created as all tables are properly linked with each 
other in an application.

6.   This is only first step and many advantages can be a result of 
brainstorm.

7.This will change RDBMS, Tools and ERP to next generation.

For any clarifications pl contact. Pl give your feedback.

Regards Many,

Vijay Ghatpande.

Cell: +91 9822456142.

Re: [HACKERS] Implementation of Date/Time Input Interpretation

2010-06-22 Thread Dann Corbit

From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Francis Markham
Sent: Tuesday, June 22, 2010 7:13 PM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Implementation of Date/Time Input Interpretation

Greetings all,

I am currently implementing a script to import data into postgres.  I would 
like to apply the algorithm to detect date and time values, outlined at 
http://developer.postgresql.org/pgdocs/postgres/datetime-input-rules.html

However, I am unfamiliar (and somewhat intimidated) by the postgres source 
tree.  Would any kind person be able to point me to the source file(s) that 
implement the above algorithm?

You will find it under \src\backend\utils\adt\datetime.c
To import data into postgres, I guess that reading the date time routine is 
probably not what you want to do.
If you want to move the data in using a compiled program then use an ODBC 
driver.  PostgreSQL comes with a free one.  OLEDB is another sensible 
alternative.  Or JDBC if you want to use Java.
If you want to bulk load lots of data at high speed, read up on the COPY 
command.
If you just want to insert some rows using SQL, then simply perform an INSERT 
using PSQL  or some other interface of your choice.
What is it exactly that you are trying to accomplish?

Re: [HACKERS] Implementation of Date/Time Input Interpretation

2010-06-22 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:and...@dunslane.net]
 Sent: Tuesday, June 22, 2010 7:47 PM
 To: Francis Markham
 Cc: Dann Corbit; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Implementation of Date/Time Input Interpretation

 Francis Markham wrote:
  Thank you for your prompt reply.

  What is it exactly that you are trying to accomplish?

  I want to be able to, from my own script, determine if postgres will
  be able to interpret a string as a date or time.  If you can suggest
 a
  better way of accomplishing this beyond reimplementing your algorithm
  I would be happy to hear it!

 Call the appropriate input function in plpgsql and trap a data
 exception? These routines are going to be quite hard to mimic, I
 suspect. Getting postgres to do the work for you is probably a better
 way to go if you can.

For the O.P.:

This is the specification of the input format that is needed for date/time 
values:
http://www.postgresql.org/docs/8.4/interactive/datatype-datetime.html
See also:
http://www.postgresql.org/docs/8.4/interactive/datetime-appendix.html

If he wants to be able to simply validate date/time values before insertion, I 
would suggest a package like libmcal and pick out the file datetime.c, in 
particular:
extern bool datevalid(int year,int mon,int mday);
extern bool timevalid(int hour,int min,int sec);

The PostgreSQL database routine has lots of fluff intended for interfacing with 
the database, etc. which makes a simpler approach easier if validation is what 
is wanted.  Of course date/time/calendar libraries are available in just about 
every language.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC

2010-03-29 Thread Dann Corbit

Maybe this can be helpful:
http://sourceforge.net/search/?type_of_search=softwords=fingerprint
http://sourceforge.net/search/?type_of_search=softwords=image+recogniti
on


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Anindya Jyoti Roy
 Sent: Monday, March 29, 2010 1:04 PM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] GSoC
 
 I have some idea of implementing am image database system, which will
 let you search against the image and fingerprint:
 
 The idea crudely is the following:
 
 I want to implement a image database system: This will have the
 following quality:
 1 will store image along with the other attributes
 2 the database search engine will be able to search for image also
 3 it will list the matching images in the order of degree of match.
 4 in this matching system I will likely use the system of dividing
the
 image into important parts and match them.
 5 The database will also contain fingerprints, that may  be the
 primary
 key.
 6 it will match the finger prints on the basis of the nodes and then
 making a bitmap of it and using the coordinates with some error it
will
 match them
 7 to include some accuracy in case of the angle of picture in case of
 search of equality we will introduce some logical amount of error.
 8 this project can be coded by me as I have some previous experience
 in
 this type of project.
 9 this will make a definitly good search engine as well as innovative
 and uncommon.
 
 
 
 About me:
 
 I am Sophomore of Indian Institute of Technology, kanpur in CSE BTech.
 I
 have a good academic record till date and a keen interest on Data
 Mining
 and Machine language as well as Image processing. I am attaching my
 resume in short form with this mail hoping it may help you to judge my
 credibility. I am strong in Coding and Maths and able to solve
problems
 within given timeline.
 
 I can give a more detail explanation of my project. But I just wanted
 to
 give an outline here.
 
 Let me know the feedback.
 Thanks for your time,
 
 Your faithfully,
 
  Anindya Jyoti Roy
  Sophomore
  CSE Bteck
  IIT Kanpur

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OpenVMS?

2010-02-16 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Rayson Ho
 Sent: Tuesday, February 16, 2010 9:39 AM
 To: David Fetter
 Cc: Andrew Dunstan; PG Hackers
 Subject: Re: [HACKERS] OpenVMS?

 On Tue, Feb 16, 2010 at 11:22 AM, David Fetter wrote:
     * Shell access from several accounts
     * Git or cvs client
     * Compiler tools
     * Perl of a fairly recent vintage
     * Outbound http access

 I had access to the HP testdrive before they closed it down (the Unix
 servers were down in Sept 08, but the VMS cluster was running till Jan
 2010). They blocked all outbound internet access to the testdrive
 servers -- only telnet and ftp were allowed.

 Count me in for the OpenVMS porting effort (but I guess I will apply
 for an account seperately as I might port other things to OpenVMS in
 the future). I believe the porting effort is larger than a new Unix
 port but smaller than the Windows port, as most of the Unix and POSIX
 functions and system calls are supported on OpenVMS.

 MySQL has around 10 functions changed or written specifically for
 OpenVMS, most of those are related to utime(), $UMASK  $UMASKDIR,
 open(), and Unix pathnames. (I think utime() support was added a few
 years ago to OpenVMS 7.3  8.0 -- so may be the code was added to
 MySQL for earlier VMS versions.)

 PostgreSQL uses fork(), which is not supported on OpenVMS. However,
 the techniques used by the WIN32 version of internal_forkexec() in
 src/backend/postmaster/postmaster.c give the VMS version a good
 starting point.

For PostgreSQL, you will probably want to use LIB$SPAWN() as a rough equivalent 
to CreateProcess() on Windows
http://www.sysworks.com.au/disk$vaxdocsep002/opsys/vmsos721/5932/5932pro_041.html

 Lastly, are we going to support ODS-2 disks?? And are we going to
 require GNV installed for building and running PostgreSQL on OpenVMS??

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] MonetDB test says that PostgreSQL often has errors or missing results

2010-01-19 Thread Dann Corbit

See:
http://monetdb.cwi.nl/SQL/Benchmark/TPCH/
If the result is correct, then the problem queries should be added to
the regression test suite.
If the result is not correct, then perhaps they could get assistance on
proper configuration of PostgreSQL and rerun the tests.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Source code for pg_bulkload

2010-01-14 Thread Dann Corbit

How can I obtain the source code for pg_bulkload?

I am interested in writing an API version, so that I can imbed this
loading facility into programs.

The page http://pgfoundry.org/projects/pgbulkload/ has links to the
binaries, but I want the source.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] planner or statistical bug on 8.5

2010-01-12 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Pavel Stehule
 Sent: Monday, January 11, 2010 11:55 PM
 To: PostgreSQL Hackers
 Cc: Tom Lane
 Subject: [HACKERS] planner or statistical bug on 8.5
 
 Hello
 
 I checked query and I was surprised with very strange plan:
 
 postgres=# create table a(a int primary key);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 a_pkey for table a
 CREATE TABLE
 postgres=# create table b(b int primary key);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 b_pkey for table b
 CREATE TABLE
 postgres=# create table c(c int primary key);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 c_pkey for table c
 CREATE TABLE
 postgres=# ANALYZE ;
 ANALYZE
 postgres=# explain select  a, b from a,b,c;
   QUERY PLAN
 ---
  Nested Loop  (cost=0.00..276595350.00 rows=1382400 width=8)
-  Nested Loop  (cost=0.00..115292.00 rows=576 width=8)
  -  Seq Scan on a  (cost=0.00..34.00 rows=2400 width=4)
  -  Materialize  (cost=0.00..82.00 rows=2400 width=4)
-  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
-  Materialize  (cost=0.00..82.00 rows=2400 width=0)
  -  Seq Scan on c  (cost=0.00..34.00 rows=2400 width=0)
 (7 rows)

You have no join columns, so it is a simple product.

Perhaps you meant something like this:
EXPLAIN SELECT a.a, b.b, c.c FROM a a, b b, c c WHERE a.a = b.b AND a.a = c.c



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] boolean in C

2009-07-16 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Bernd Helmle
 Sent: Thursday, July 16, 2009 8:47 AM
 To: Grzegorz Jaskiewicz
 Cc: pgsql-hackers Hackers
 Subject: Re: [HACKERS] boolean in C

 --On 16. Juli 2009 13:32:03 +0100 Grzegorz Jaskiewicz
 g...@pointblue.com.pl
 wrote:

  oh, another thing.
  stdbool is C99 standard feature. Not gcc extension.

 There might be compiler versions out there which claims to be C99 but
 do
 not provide full compliant include headers. SUN Studio 12 at least has
 the
 following in its documentation, as a quick research brings up:

 Though the compiler defaults to supporting the features of C99 listed
 below, standard headers provided by the Solaris software in
 /usr/include do
 not yet conform with the 1999 ISO/IEC C standard

It's more or less a generic problem.  There is only a handful of fully
functional C99 compilers[0], and all the others have Some c99 features
to one degree or another.

Microsoft's compiler is particularly abysmal, but then again, they have
no claims of C99 compliance so there is nothing to complain about there.
Those few features that they do implement are implemented in a
non-standard way.

GCC is also only partially compliant[1].  

I believe that the Dinkum library is the only certified C99 standard
library[2] as well.

[0] see: http://www.peren.com/pages/branding_set.htm 
[1] see: http://gcc.gnu.org/c99status.html
[2] see: http://www.dinkumware.com/manuals/

I don't think that a product (that is expected to run on as many
platforms as PostgreSQL is expected to run on) is even possible to write
in C99 code because there are not enough compliant compilers available.

IMO-YMMV

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Throw some low-level C scutwork at me

2009-05-02 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Friday, May 01, 2009 6:00 PM
 To: Andy Lester
 Cc: Dimitri Fontaine; PostgreSQL-development
 Subject: Re: [HACKERS] Throw some low-level C scutwork at me

 On Fri, May 1, 2009 at 4:35 PM, Andy Lester a...@petdance.com wrote:
  There should be nothing to maintain, if it's done right.

  Any line in the source tree will have to get maintained, or why
 would you
  spend any time writing it?

  I meant by hand.

  See doc/FAQ_DEV and those specific lines:

  I see no such file.  Perhaps it doesn't get exported into the git
 mirror?

  Sorry if the push-back has been read as harsh, but I've got the
 (very
  personal) feeling that to become a contributor to PostgreSQL, you
 *will*
  have to be able to read this level of criticism back from the mail
 you send.

  I'm all for criticism of ideas.  I wish there had been some in Tom's
  original mail.

 OK, so, when I initially started catching up on this thread, I was
 kind of feeling annoyed at Tom, and I still wish he'd say something
 along the lines of I did not mean to give offense and I'm sorry if my
 words came across in a way that I did not intend rather than just
 explaining why he reacted the way he did. 

I think it's just Tom's way.

Higgins (from Pygmalion):
The great secret, Eliza, is not having bad manners or good manners or any other 
particular sort of manners, but having the same manner for all human souls: in 
short, behaving as if you were in Heaven, where there are no third-class 
carriages, and one soul is as good as another.

For comparison, here is a recent message from Tom addressed to me, along with 
my response:
=
 -Original Message-
 From: Tom Lane [mailto:t...@sss.pgh.pa.us]
 Sent: Tuesday, April 28, 2009 3:21 PM
 To: Dann Corbit
 Cc: Andrew Dunstan; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Building Postgresql under Windows question

 Dann Corbit dcor...@connx.com writes:
  From: Andrew Dunstan [mailto:and...@dunslane.net]

  Why won't
  PQstatus(conn) == CONNECTION_OK
  be true and thus the code will succeed without requiring a password?

  It returns the value CONNECTION_STARTED

 It certainly shouldn't.  You're effectively asserting that PQconnectdb
 is broken for everyone on every platform, which is demonstrably not the
 case.  Are you fooling with modified libpq code by any chance?

No.
The service works correctly when I use password authentication.
The service does not work correctly when I use trust.

What happens when you use trust in pg_hba.conf?
=

Now, I do not think that Tom is a bad person at all.  Quite the contrary, he's 
clearly very smart and also goes to great lengths in attempts to be helpful 
(sometimes chewing someone out is being helpful as well).  He sometimes comes 
off as brusque -- but to some degree that comes from reading between the lines 
and being over sensitive.

I think the lesson to be learned here is that when making any sort of message 
sent to the internet, the most sensible policy is to grow a skin at least five 
inches thick.

I am certainly glad that Tom is on the PostgreSQL team.  I expect that the 
product will come out ahead from it in the long run.  And as for being 
offended, I am reminded of a Bible proverb:
(Ecclesiastes 7:9) Do not hurry yourself in your spirit to become offended, for 
the taking of offense is what rests in the bosom of the stupid ones.

 That having been said, as
 far as I can tell, your feeling that Tom said something rude is based
 largely on the fact that he used the word sucked, and perhaps the
 phrase rejected out of hand.  Admittedly, Tom could have described
 why he thought it sucked rather than just saying that it did, and he
 could have said that he would vote against accepting it and believed
 that others would not like it either rather than phrasing it in the
 way that he did.

 Then again, you didn't offer any justification for your desire to have
 them in there either.  You didn't ask whether they'd been previously
 considered or whether the community would find them desirable.  You
 didn't make an argument for why they'd be better than the system
 currently in use or even, at least as far as can be determined from
 reading the email that set off this flame war, take the time to
 understand that system before proposing your own.

 One thing I have discovered about pgsql-hackers is that it is very
 easy to be accused of not having done your homework even if you
 actually have.  I have seen more than one well-thought-out proposal
 shot down by a committer who (as it seemed to me) had thought it
 through less carefully than the person proposing it.  On the other
 hand, there are five or ten half-baked ideas for every good one, so
 the committers have

[HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

We are still having trouble with the service controller pg_ctl.exe
vanishing after some period of time.

Hence I am doing debug builds from the source tree according to the
instructions found at:
http://developer.postgresql.org/pgdocs/postgres/install-win32-full.html

Specifically, I am using this method:
It is also possible to build from inside the Visual Studio GUI. In this
case, you need to run: 
 perl mkvcbuild.pl

There are a few projects that will not build (only 7 out of 99).

I have not installed ossp-uuid yet, so one of the failures is expected.
On the CONTRIB projects I am not concerned at all.
However, on the internationalization failures, I want to know what the
failures mean.
Here is the output of the debug build process from the Visual Studio IDE
(includes all failures):

1-- Build started: Project: utf8_and_euc_kr, Configuration: Debug
Win32 --
1Generate DEF file
1Not re-generating UTF8_AND_EUC_KR.DEF, file already exists.
1Linking...
1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
euc_jp_to_utf8
1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
pg_finfo_euc_jp_to_utf8
1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
pg_finfo_utf8_to_euc_jp
1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
utf8_to_euc_jp
1Debug\utf8_and_euc_kr\utf8_and_euc_kr.lib : fatal error LNK1120: 4
unresolved externals
2-- Build started: Project: utf8_and_euc_cn, Configuration: Debug
Win32 --
3-- Build started: Project: euc_kr_and_mic, Configuration: Debug
Win32 --
1Build log was saved at
file://c:\dcorbit64\postgresql\postgresql-8.3.7\Debug\utf8_and_euc_kr\B
uildLog.htm
1utf8_and_euc_kr - 5 error(s), 0 warning(s)
2Generate DEF file
3Generate DEF file
3Not re-generating EUC_KR_AND_MIC.DEF, file already exists.
2Not re-generating UTF8_AND_EUC_CN.DEF, file already exists.
3Linking...
2Linking...
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
euc_jp_to_mic
2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
koi8r_to_utf8
2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
pg_finfo_koi8r_to_utf8
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
euc_jp_to_sjis
2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
pg_finfo_utf8_to_koi8r
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
mic_to_euc_jp
2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
utf8_to_koi8r
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
mic_to_sjis
2Debug\utf8_and_euc_cn\utf8_and_euc_cn.lib : fatal error LNK1120: 4
unresolved externals
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_euc_jp_to_mic
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_euc_jp_to_sjis
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_mic_to_euc_jp
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_mic_to_sjis
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_sjis_to_euc_jp
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
pg_finfo_sjis_to_mic
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
sjis_to_euc_jp
3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
sjis_to_mic
3Debug\euc_kr_and_mic\euc_kr_and_mic.lib : fatal error LNK1120: 12
unresolved externals
2Build log was saved at
file://c:\dcorbit64\postgresql\postgresql-8.3.7\Debug\utf8_and_euc_cn\B
uildLog.htm
2utf8_and_euc_cn - 5 error(s), 0 warning(s)
4-- Build started: Project: utf8_and_shift_jis_2004, Configuration:
Debug Win32 --
4Generate DEF file
3Build log was saved at
file://c:\dcorbit64\postgresql\postgresql-8.3.7\Debug\euc_kr_and_mic\Bu
ildLog.htm
3euc_kr_and_mic - 13 error(s), 0 warning(s)
4Not re-generating UTF8_AND_SHIFT_JIS_2004.DEF, file already exists.
5-- Build started: Project: uuid-ossp, Configuration: Debug Win32
--
5Compiling...
4Linking...
5uuid-ossp.c
4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
symbol pg_finfo_uhc_to_utf8
4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
symbol pg_finfo_utf8_to_uhc
4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
symbol uhc_to_utf8
4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
symbol utf8_to_uhc
4Debug\utf8_and_shift_jis_2004\utf8_and_shift_jis_2004.lib : fatal
error LNK1120: 4 unresolved externals
5.\contrib\uuid-ossp\uuid-ossp.c(27) : fatal error C1083: Cannot open
include file: 'uuid.h': No such file or directory
5Build log was saved at
file://c:\dcorbit64\postgresql\postgresql-8.3.7\Debug\uuid-ossp\BuildLo
g.htm
5uuid-ossp - 1 error(s), 0 warning(s)
4Build log was saved at
file://c:\dcorbit64\postgresql\postgresql-8.3.7\Debug\utf8_and_shift_ji
s_2004\BuildLog.htm
4utf8_and_shift_jis_2004 - 5 error(s), 0 warning(s)
6-- Build started: Project: hstore, Configuration: Debug Win32
--
6Generate DEF file
6Not

Re: [HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:and...@dunslane.net]
 Sent: Tuesday, April 28, 2009 12:42 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Building Postgresql under Windows question

 Dann Corbit wrote:
  Pg_ctl.exe is exiting with a success code from line 1946 of PG_CTL.C

  This is not appropriate behavior for a service unless shutdown has
 been
  requested.

 pg_ctl calls *StartServiceCtrlDispatcher*(). It can only get to the
 line
 you mention when called as a service after that call returns. MSDN
 states:

 If *StartServiceCtrlDispatcher* succeeds, it connects the calling
 thread to the service control manager and does not return until all
 running services in the process have entered the SERVICE_STOPPED
 state.

 So it appears that something is causing your service to enter that
 state.

It is interesting that it happens even if I run no queries at all.

This is the only reference to the service control dispatcher I can find
in pg_ctl.c:

static void
pgwin32_doRunAsService(void)
{
SERVICE_TABLE_ENTRY st[] = {{register_servicename,
pgwin32_ServiceMain},
{NULL, NULL}};

if (StartServiceCtrlDispatcher(st) == 0)
{
write_stderr(_(%s: could not start service \%s\:
error code %d\n), progname, register_servicename, (int)
GetLastError());
exit(1);
}
}

BTW, the exit(1) calls should be exit(EXIT_FAILURE) though there will be
no difficulty on any POSIX system.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:and...@dunslane.net]
 Sent: Tuesday, April 28, 2009 12:42 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Building Postgresql under Windows question
 
 
 
 Dann Corbit wrote:
  Pg_ctl.exe is exiting with a success code from line 1946 of PG_CTL.C
 
  This is not appropriate behavior for a service unless shutdown has
 been
  requested.
 
 
 pg_ctl calls *StartServiceCtrlDispatcher*(). It can only get to the
 line
 you mention when called as a service after that call returns. MSDN
 states:
 
 If *StartServiceCtrlDispatcher* succeeds, it connects the calling
 thread to the service control manager and does not return until all
 running services in the process have entered the SERVICE_STOPPED
 state.
 
 So it appears that something is causing your service to enter that
 state.

It appears that SERVICE_STOPPED comes from here:

static void WINAPI
pgwin32_ServiceMain(DWORD argc, LPTSTR * argv)
{
PROCESS_INFORMATION pi;
DWORD   ret;
DWORD   check_point_start;

/* Initialize variables */
status.dwWin32ExitCode = S_OK;
status.dwCheckPoint = 0;
status.dwWaitHint = 6;
status.dwServiceType = SERVICE_WIN32_OWN_PROCESS;
status.dwControlsAccepted = SERVICE_ACCEPT_STOP |
SERVICE_ACCEPT_SHUTDOWN | SERVICE_ACCEPT_PAUSE_CONTINUE;
status.dwServiceSpecificExitCode = 0;
status.dwCurrentState = SERVICE_START_PENDING;

memset(pi, 0, sizeof(pi));

read_post_opts();

/* Register the control request handler */
if ((hStatus = RegisterServiceCtrlHandler(register_servicename,
pgwin32_ServiceHandler)) == (SERVICE_STATUS_HANDLE) 0)
return;

if ((shutdownEvent = CreateEvent(NULL, true, false, NULL)) ==
NULL)
return;

/* Start the postmaster */
pgwin32_SetServiceStatus(SERVICE_START_PENDING);
if (!CreateRestrictedProcess(pgwin32_CommandLine(false), pi,
true))
{
pgwin32_SetServiceStatus(SERVICE_STOPPED);
return;
}
postmasterPID = pi.dwProcessId;
postmasterProcess = pi.hProcess;
CloseHandle(pi.hThread);

if (do_wait)
{
write_eventlog(EVENTLOG_INFORMATION_TYPE, _(Waiting for
server startup...\n));
if (test_postmaster_connection(true) == false)
{
write_eventlog(EVENTLOG_INFORMATION_TYPE,
_(Timed out waiting for server startup\n));
pgwin32_SetServiceStatus(SERVICE_STOPPED);
return;
}
write_eventlog(EVENTLOG_INFORMATION_TYPE, _(Server
started and accepting connections\n));
}

/*
 * Save the checkpoint value as it might have been incremented
in
 * test_postmaster_connection
 */
check_point_start = status.dwCheckPoint;

pgwin32_SetServiceStatus(SERVICE_RUNNING);

/* Wait for quit... */
ret = WaitForMultipleObjects(2, shutdownHandles, FALSE,
INFINITE);

pgwin32_SetServiceStatus(SERVICE_STOP_PENDING);
switch (ret)
{
case WAIT_OBJECT_0: /* shutdown event */
kill(postmasterPID, SIGINT);

/*
 * Increment the checkpoint and try again Abort
after 12
 * checkpoints as the postmaster has probably
hung
 */
while (WaitForSingleObject(postmasterProcess,
5000) == WAIT_TIMEOUT  status.dwCheckPoint  12)
status.dwCheckPoint++;
break;

case (WAIT_OBJECT_0 + 1):   /* postmaster
went down */
break;

default:
/* shouldn't get here? */
break;
}

CloseHandle(shutdownEvent);
CloseHandle(postmasterProcess);

pgwin32_SetServiceStatus(SERVICE_STOPPED);
}

I will set a breakpoint on every place that the status is set to
SERVICE_STOPPED and report what I have found.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

In this function:
static bool
test_postmaster_connection(bool do_checkpoint)

This code will never succeed:
snprintf(connstr, sizeof(connstr),
 dbname=postgres port=%s connect_timeout=5,
portstr);

for (i = 0; i  wait_seconds; i++)
{
if ((conn = PQconnectdb(connstr)) != NULL 
(PQstatus(conn) == CONNECTION_OK ||
 PQconnectionNeedsPassword(conn)))
{
PQfinish(conn);
success = true;
break;
}


Because pg_hba.conf has this:
# TYPE  DATABASEUSERCIDR-ADDRESS  METHOD

# IPv4 local connections:
hostall all 127.0.0.1/32  trust
# IPv6 local connections:
hostall all ::1/128   trust

to allow connections from the local machine.

(A password is not needed to connect, so the code always fails).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:and...@dunslane.net]
 Sent: Tuesday, April 28, 2009 2:27 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Building Postgresql under Windows question
 
 
 
 Dann Corbit wrote:
  In this function:
  static bool
  test_postmaster_connection(bool do_checkpoint)
 
  This code will never succeed:
  snprintf(connstr, sizeof(connstr),
   dbname=postgres port=%s connect_timeout=5,
  portstr);
 
  for (i = 0; i  wait_seconds; i++)
  {
  if ((conn = PQconnectdb(connstr)) != NULL 
  (PQstatus(conn) == CONNECTION_OK ||
   PQconnectionNeedsPassword(conn)))
  {
  PQfinish(conn);
  success = true;
  break;
  }
 
 
  Because pg_hba.conf has this:
  # TYPE  DATABASEUSERCIDR-ADDRESS  METHOD
 
  # IPv4 local connections:
  hostall all 127.0.0.1/32  trust
  # IPv6 local connections:
  hostall all ::1/128   trust
 
  to allow connections from the local machine.
 
  (A password is not needed to connect, so the code always fails).
 
 
 
 
 Why won't
 
   PQstatus(conn) == CONNECTION_OK
 
 be true and thus the code will succeed without requiring a password?

It returns the value CONNECTION_STARTED

I have found a work-around for now.

If I set the method to password in pg_hba.conf, the service starts and
runs correctly.

It is only when the method is set to trust that we get 100% failures.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Building Postgresql under Windows question

2009-04-28 Thread Dann Corbit

Pg_ctl.exe is exiting with a success code from line 1946 of PG_CTL.C

This is not appropriate behavior for a service unless shutdown has been
requested.

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Dann Corbit
 Sent: Tuesday, April 28, 2009 11:47 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Building Postgresql under Windows question
 
 We are still having trouble with the service controller pg_ctl.exe
 vanishing after some period of time.
 
 Hence I am doing debug builds from the source tree according to the
 instructions found at:

http://developer.postgresql.org/pgdocs/postgres/install-win32-full.html
 
 Specifically, I am using this method:
 It is also possible to build from inside the Visual Studio GUI. In
this
 case, you need to run:
  perl mkvcbuild.pl
 
 There are a few projects that will not build (only 7 out of 99).
 
 I have not installed ossp-uuid yet, so one of the failures is
expected.
 On the CONTRIB projects I am not concerned at all.
 However, on the internationalization failures, I want to know what the
 failures mean.
 Here is the output of the debug build process from the Visual Studio
 IDE
 (includes all failures):
 
 1-- Build started: Project: utf8_and_euc_kr, Configuration: Debug
 Win32 --
 1Generate DEF file
 1Not re-generating UTF8_AND_EUC_KR.DEF, file already exists.
 1Linking...
 1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
 euc_jp_to_utf8
 1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
 pg_finfo_euc_jp_to_utf8
 1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
 pg_finfo_utf8_to_euc_jp
 1utf8_and_euc_kr.def : error LNK2001: unresolved external symbol
 utf8_to_euc_jp
 1Debug\utf8_and_euc_kr\utf8_and_euc_kr.lib : fatal error LNK1120: 4
 unresolved externals
 2-- Build started: Project: utf8_and_euc_cn, Configuration: Debug
 Win32 --
 3-- Build started: Project: euc_kr_and_mic, Configuration: Debug
 Win32 --
 1Build log was saved at
 file://c:\dcorbit64\postgresql\postgresql-
 8.3.7\Debug\utf8_and_euc_kr\B
 uildLog.htm
 1utf8_and_euc_kr - 5 error(s), 0 warning(s)
 2Generate DEF file
 3Generate DEF file
 3Not re-generating EUC_KR_AND_MIC.DEF, file already exists.
 2Not re-generating UTF8_AND_EUC_CN.DEF, file already exists.
 3Linking...
 2Linking...
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 euc_jp_to_mic
 2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
 koi8r_to_utf8
 2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
 pg_finfo_koi8r_to_utf8
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 euc_jp_to_sjis
 2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
 pg_finfo_utf8_to_koi8r
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 mic_to_euc_jp
 2utf8_and_euc_cn.def : error LNK2001: unresolved external symbol
 utf8_to_koi8r
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 mic_to_sjis
 2Debug\utf8_and_euc_cn\utf8_and_euc_cn.lib : fatal error LNK1120: 4
 unresolved externals
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_euc_jp_to_mic
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_euc_jp_to_sjis
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_mic_to_euc_jp
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_mic_to_sjis
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_sjis_to_euc_jp
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 pg_finfo_sjis_to_mic
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 sjis_to_euc_jp
 3euc_kr_and_mic.def : error LNK2001: unresolved external symbol
 sjis_to_mic
 3Debug\euc_kr_and_mic\euc_kr_and_mic.lib : fatal error LNK1120: 12
 unresolved externals
 2Build log was saved at
 file://c:\dcorbit64\postgresql\postgresql-
 8.3.7\Debug\utf8_and_euc_cn\B
 uildLog.htm
 2utf8_and_euc_cn - 5 error(s), 0 warning(s)
 4-- Build started: Project: utf8_and_shift_jis_2004,
 Configuration:
 Debug Win32 --
 4Generate DEF file
 3Build log was saved at
 file://c:\dcorbit64\postgresql\postgresql-
 8.3.7\Debug\euc_kr_and_mic\Bu
 ildLog.htm
 3euc_kr_and_mic - 13 error(s), 0 warning(s)
 4Not re-generating UTF8_AND_SHIFT_JIS_2004.DEF, file already exists.
 5-- Build started: Project: uuid-ossp, Configuration: Debug Win32
 --
 5Compiling...
 4Linking...
 5uuid-ossp.c
 4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
 symbol pg_finfo_uhc_to_utf8
 4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
 symbol pg_finfo_utf8_to_uhc
 4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
 symbol uhc_to_utf8
 4utf8_and_shift_jis_2004.def : error LNK2001: unresolved external
 symbol utf8_to_uhc
 4Debug\utf8_and_shift_jis_2004\utf8_and_shift_jis_2004.lib : fatal
 error LNK1120: 4

[HACKERS] One click installer and pgInstaller collide

2009-04-10 Thread Dann Corbit

If you install with the PostgreSQL group's installer for Windows:
http://www.postgresql.org/ftp/binary

And then install a second instance of PostgreSQL using the EnterpriseDB
one click installer:
http://www.enterprisedb.com/products/pgdownload.do#windows

An error may occur.  It will occur, probably, either with one system or
the other.
The PG Admin III produce uses a DLL called libiconv-2.dll
One of these DLL files has an entry point called
libiconv_set_relocation_prefix and the other one does not.

When you reference the DLL file, it will complain about an entry point
that could not be located.

It is probably a fairly esoteric issue, since most people (if they
install multiple instances of a product) will install only a single
version of it.  But we support everything under the sun, so I purposely
try to install as many versions as possible for testing.

Anyway, I thought you might like to know about this odd collision.

64 bit Windows platform.
PG Installer version, installed first, was 8.3.5
EnterpriseDB version was 8.3.7
There was an error message (file in use) during installation of the
second tool set.

It is also possible that it was caused by the small step from 8.3.5 to
8.3.7 and the problem would have occurred with an upgrade using the same
version (I did not test to find out).

I do not know if this message will get to the PG group.  None of my
messages has shown up this week for some reason.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] FW: Multiple character encodings within a single database/table?

2009-04-10 Thread Dann Corbit

 -Original Message-
 From: Dann Corbit
 Sent: Monday, March 23, 2009 3:50 PM
 To: pgsql-gene...@postgresql.org
 Subject: Multiple character encodings within a single database/table?

 If I have the C locale, can I have multiple character encodings
within:
 1.  A single database?
 2.  A single table?

 More specifically, I would like to be able to have Unicode columns and
 ASCII text columns within the same table.  Is this possible?  If so,
 how do I achieve it?

 It was not clear to me from:
 http://www.postgresql.org/docs/current/static/multibyte.html

 It seems to me from this statement:
 It can be overridden when you create a database, so you can have
 multiple databases each with a different character set.
 That it may be database wide, but I am not sure that it is not
possible
 to have both ordinary char and Unicode in the same table.

 Possible or not?

I know that this message is better placed on the general group, but I
got no reply in 24 hours.
So perhaps I can get a rise in hackers...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Windows installation service

2009-04-10 Thread Dann Corbit

The Windows installation service uses pg_ctl to perform the network
start-up operation.
This program starts up the postmaster and exits.
The net effect of performing the operation in this manner is that the
Windows service manager sees the service as not running a few minutes
after the startup is complete.  It also prevents proper pause and
restart of the service.

As a suggestion:
Instead of installing pg_ctl as the service, start up postgres as the
service.  This is how we did our Windows port.  If the idea is appealing
to the PostgreSQL group, we can send our service code modifications for
review as a possible alternative to the current method.

Another approach that could be equally helpful (along the same lines) is
to leave pg_ctl.exe in memory and allow it to control the program.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows installation service

2009-04-10 Thread Dann Corbit

 -Original Message-
 From: Dave Page [mailto:dp...@pgadmin.org]
 Sent: Friday, April 10, 2009 8:16 AM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; Bill Luton; Larry McGhaw; Mike McKee;
 Brian Fifer
 Subject: Re: [HACKERS] Windows installation service

 On Mon, Apr 6, 2009 at 9:32 PM, Dann Corbit dcor...@connx.com wrote:
  The Windows installation service uses pg_ctl to perform the network
  start-up operation.
  This program starts up the postmaster and exits.
  The net effect of performing the operation in this manner is that the
  Windows service manager sees the service as not running a few
 minutes
  after the startup is complete.  It also prevents proper pause and
  restart of the service.

 Per our offlist conversation, this is not how it works.

  As a suggestion:
  Instead of installing pg_ctl as the service, start up postgres as the
  service.  This is how we did our Windows port.  If the idea is
 appealing
  to the PostgreSQL group, we can send our service code modifications
 for
  review as a possible alternative to the current method.

  Another approach that could be equally helpful (along the same lines)
 is
  to leave pg_ctl.exe in memory and allow it to control the program.

 Which is what does happen.

I don't know the reason why, but that is not what happens here.

We see the problem on 64-bit machines with Windows 2008 Server.
We see the problem on 32-bit machine with Windows 2003 Server.
We see the problem on 32-bit Windows XP machines.
It is universal (all of these machines demonstrate the problem).

I did get this email from Mike McKee this morning:

I noticed a pattern.  

The first time it works, and can shutdown.

The second time is where it kind of hangs 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows installation service

2009-04-10 Thread Dann Corbit

 -Original Message-
 From: Dann Corbit
 Sent: Friday, April 10, 2009 12:30 PM
 To: 'Dave Page'
 Cc: pgsql-hackers@postgresql.org; Bill Luton; Larry McGhaw; Mike McKee;
 Brian Fifer
 Subject: RE: [HACKERS] Windows installation service

  -Original Message-
  From: Dave Page [mailto:dp...@pgadmin.org]
  Sent: Friday, April 10, 2009 8:16 AM
  To: Dann Corbit
  Cc: pgsql-hackers@postgresql.org; Bill Luton; Larry McGhaw; Mike
 McKee;
  Brian Fifer
  Subject: Re: [HACKERS] Windows installation service

  On Mon, Apr 6, 2009 at 9:32 PM, Dann Corbit dcor...@connx.com
 wrote:
   The Windows installation service uses pg_ctl to perform the network
   start-up operation.
   This program starts up the postmaster and exits.
   The net effect of performing the operation in this manner is that
 the
   Windows service manager sees the service as not running a few
  minutes
   after the startup is complete.  It also prevents proper pause and
   restart of the service.

  Per our offlist conversation, this is not how it works.

   As a suggestion:
   Instead of installing pg_ctl as the service, start up postgres as
 the
   service.  This is how we did our Windows port.  If the idea is
  appealing
   to the PostgreSQL group, we can send our service code modifications
  for
   review as a possible alternative to the current method.

   Another approach that could be equally helpful (along the same
 lines)
  is
   to leave pg_ctl.exe in memory and allow it to control the program.

  Which is what does happen.

 I don't know the reason why, but that is not what happens here.

 We see the problem on 64-bit machines with Windows 2008 Server.
 We see the problem on 32-bit machine with Windows 2003 Server.
 We see the problem on 32-bit Windows XP machines.
 It is universal (all of these machines demonstrate the problem).

 I did get this email from Mike McKee this morning:

 I noticed a pattern.

 The first time it works, and can shutdown.

 The second time is where it kind of hangs 

I should mention that PostgreSQL is still operational.  The Postgres servers 
are in memory and I am able to perform queries, despite the apparent status of 
the service.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows installation service

2009-04-10 Thread Dann Corbit

 -Original Message-
 From: Dave Page [mailto:dp...@pgadmin.org]
 Sent: Friday, April 10, 2009 1:58 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; Bill Luton; Larry McGhaw; Mike
McKee;
 Brian Fifer
 Subject: Re: [HACKERS] Windows installation service

 On Fri, Apr 10, 2009 at 8:29 PM, Dann Corbit dcor...@connx.com
wrote:

  I don't know the reason why, but that is not what happens here.

  We see the problem on 64-bit machines with Windows 2008 Server.
  We see the problem on 32-bit machine with Windows 2003 Server.
  We see the problem on 32-bit Windows XP machines.
  It is universal (all of these machines demonstrate the problem).

  I did get this email from Mike McKee this morning:

  I noticed a pattern.

  The first time it works, and can shutdown.

  The second time is where it kind of hangs 

 So what is unusual about your machines that makes this happen for you,
 but apparently noone else we've heard of? Are you running on a domain?

Yes.  This is a corporation, so we don't run peer to peer.

 Any nonstandard security policy or other configuration?

Nothing unusual.

 I know Connx have experience developing on Windows, so can one of you
 attach a debugger and see why the WaitForMultipleObjects() call in
 pg_ctl doesn't work on your systems?

Here is what we tried:

We decided to start the service from the command line as an executable
rather than a service.

First, we found permissions problems:

C:\CONNX32\CONNXSTORE\binC:\CONNX32\CONNXSTORE\bin\pg_ctl.exe start -w
-N pgsql-8.3 -D C:\CONNX32\CONNXSTORE\data\
waiting for server to start...2009-04-10 14:55:22 PDT LOG:  loaded
library $libdir/plugins/plugin_debugger.dll
2009-04-10 14:55:22 PDT PANIC:  could not open control file
global/pg_control: Permission denied

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.
could not
start server

We allowed user postgres to have full control for the global folder.

C:\CONNX32\CONNXSTORE\binC:\CONNX32\CONNXSTORE\bin\pg_ctl.exe start -w
-N pgsql-8.3 -D C:\CONNX32\CONNXSTORE\data\
waiting for server to start...2009-04-10 15:09:50 PDT LOG:  loaded
library $libdir/plugins/plugin_debugger.dll
2009-04-10 15:09:50 PDT LOG:  could not create file postmaster.opts:
Permission denied
... done
server started

We had a similar problem with the data folder, so we changed permissions
for user postgres to have full control for the data folder.

C:\CONNX32\CONNXSTORE\binC:\CONNX32\CONNXSTORE\bin\pg_ctl.exe start -w
-N pgsql-8.3 -D C:\CONNX32\CONNXSTORE\data\
waiting for server to start...2009-04-10 15:19:49 PDT LOG:  loaded
library $libdir/plugins/plugin_debugger.dll
could not
start server

After those changes, we no longer had permissions problems but the
server did not start from the command line.

I uninstalled Postgresql

I removed the postgres user

I removed the postgres directory structure

I installed PostgreSQL postgresql-8.3.7-1.

Apparently this version does not have the same bad symptoms (I have not
checked all the other machines yet -- that will take some time).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is there an official log reader for PostgreSQL?

2009-03-04 Thread Dann Corbit

I am interested in doing a near real time transaction processor.  If
there is a place to find the specification for the log reader, I would
be keen to know it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there an official log reader for PostgreSQL?

2009-03-04 Thread Dann Corbit

 -Original Message-
 From: Dimitri Fontaine [mailto:dfonta...@hi-media.com]
 Sent: Wednesday, March 04, 2009 12:28 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Is there an official log reader for PostgreSQL?

 Hi,

 Le 4 mars 09 à 21:17, Dann Corbit a écrit :
  I am interested in doing a near real time transaction processor.  If
  there is a place to find the specification for the log reader, I
 would
  be keen to know it.

 Have you read about PGQ yet? Some links include:
http://www.pgcon.org/2008/schedule/events/79.en.html
http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-
 framework-pgq/
http://skytools.projects.postgresql.org/doc/pgq-admin.html
http://skytools.projects.postgresql.org/doc/pgq-sql.html

Not yet.  I will read it.  Thanks.

 If your processing is to be coded in python, the facilities are
 provided in the current skytools releases. If your processing is to be
 coded in PHP, the facilities exists in the CVS and will get
 distributed soon'ish.

 If your processing is to be coded in any other language, you need SQL
 access and will be able to use the SQL level API directly, which
 provides all the raw features.

The project will be coded in C++.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Kerberos V5 required for PostgreSQL installation on Windows

2009-02-26 Thread Dann Corbit

If Kerberos V5 is not installed on a Windows platform, the following
error dialog is returned upon attempted installation:

Posgres.exe - Unable to Locate Component

This application has failed to start because krb5_32.dll was not found.
Re-installing the application may fix this problem.
[OK]


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is a plan for lmza commpression in pg_dump

2009-02-04 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Bruce Momjian
 Sent: Wednesday, February 04, 2009 3:28 PM
 To: Stanislav Lacko
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Is a plan for lmza commpression in pg_dump

 Stanislav Lacko wrote:
  Hi.

  Is it in todo or in a plan to implement lmza commpression in pg_dump
  backups?

 Nope, never heard anything about it.

In case the PG group does get interested in insertion of compression
algorithms into PostgreSQL {it seems it could be useful in many
different areas}, the 7zip format seems to be excellent in a number of
ways.

Here is an interesting benchmark that shows 7z format winning a large
area of the optimal compressors performance graph:
http://users.elis.ugent.be/~wheirman/compression/

The LZMA SDK is granted to the public domain:
http://www.7-zip.org/sdk.html

Unfortunately LZOP (which wins the top half of the optimal compressors
graph where the compression and decompression speed is more important
than amount of compression) does not have a liberal license.
http://www.lzop.org/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 8.4 release planning

2009-01-26 Thread Dann Corbit

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Joshua D. Drake
 Sent: Monday, January 26, 2009 7:42 PM
 To: KaiGai Kohei
 Cc: Tom Lane; Ron Mayer; Josh Berkus; Robert Haas; Merlin Moncure;
 Jonah H. Harris; Gregory Stark; Simon Riggs; Bruce Momjian; Bernd
 Helmle; Peter Eisentraut; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] 8.4 release planning
 
 On Tue, 2009-01-27 at 12:30 +0900, KaiGai Kohei wrote:
 
 
  BTW, I have not walked on water yet.
 
 
 I have but I always end up wet. :)

I find that it helps to freeze the water first.
;-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple postgresql.conf wizard

2008-11-25 Thread Dann Corbit

 -Original Message-
 From: Greg Stark [mailto:[EMAIL PROTECTED] On Behalf Of
 Gregory Stark
 Sent: Tuesday, November 25, 2008 5:06 PM
 To: Decibel!
 Cc: Tom Lane; Dann Corbit; Robert Haas; Bruce Momjian; Mark Wong;
 Heikki Linnakangas; Josh Berkus; Greg Smith; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Simple postgresql.conf wizard

[snip]

 As Dann said, the idea that there IS a magic number is the problem.
 *Any*
 value of default_stats_target will cause problems. Some columns will
 always
 have skewed data sets which require unusually large samples, but most
 won't
 and the system will run faster with a normal sample size for that
 majority.

No, it was somebody smarter than me who said that.

My idea was to create some kind of table which shows curves for
different values and then users will have some sort of basis for
choosing.
Of course, the guy who has 40 tables in his join with an average of 7
indexes on each table (each table containing millions of rows) and a
convoluted WHERE clause will have different needs than someone who has
simple queries and small data loads.  The quality of the current
statistical measures stored will also affect the intelligence of the
query preparation process, I am sure.  I do have a guess that larger and
more expensive queries can probably benefit more from larger samples
(this principle is used in sorting, for instance, where the sample I
collect to estimate the median might grow as {for instance} the log of
the data set size).

P.S.
I also do not believe that there is any value that will be the right
answer.  But a table of data might be useful both for people who want to
toy with altering the values and also for those who want to set the
defaults.  I guess that at one time such a table was generated to
produce the initial estimates for default values.
[snip]

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple postgresql.conf wizard -- Statistics idea...

2008-11-25 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, November 25, 2008 5:33 PM
 To: Dann Corbit
 Cc: Gregory Stark; Decibel!; Robert Haas; Bruce Momjian; Mark Wong;
 Heikki Linnakangas; Josh Berkus; Greg Smith; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Simple postgresql.conf wizard

 Dann Corbit [EMAIL PROTECTED] writes:
  I also do not believe that there is any value that will be the right
  answer.  But a table of data might be useful both for people who
want
 to
  toy with altering the values and also for those who want to set the
  defaults.  I guess that at one time such a table was generated to
  produce the initial estimates for default values.

 Sir, you credit us too much :-(.  The actual story is that the current
 default of 10 was put in when we first implemented stats histograms,
 replacing code that kept track of only a *single* most common value
 (and not very well, at that).  So it was already a factor of 10 more
 stats than we had experience with keeping, and accordingly
conservatism
 suggested not boosting the default much past that.

 So we really don't have any methodically-gathered evidence about the
 effects of different stats settings.  It wouldn't take a lot to
 convince
 us to switch to a different default, I think, but it would be nice to
 have more than none.

I do have a statistics idea/suggestion (possibly useful with some future
PostgreSQL 9.x or something):
It is a simple matter to calculate lots of interesting univarate summary
statistics with a single pass over the data (perhaps during a vacuum
full).
For instance with numerical columns, you can calculate mean, min, max,
standard deviation, skew, kurtosis and things like that with a single
pass over the data.
Here is a C++ template I wrote to do that:
http://cap.connx.com/tournament_software/STATS.HPP
It also uses this template:
http://cap.connx.com/tournament_software/Kahan.Hpp
which is a high-accuracy adder.  These things could easily be rewritten
in C instead of C++.

Now, if you store a few numbers calculated in this way, it can be used
to augment your histogram data when you want to estimate the volume of a
request. So (for instance) if someone asks for a scalar that is 
value you can look to see what percentage of the tail will hang out in
that neck of the woods using standard deviation and the mean.

I have another, similar idea (possibly useful someday far off in the
future) that I think may have some merit.  The idea is to create a
statistical index.  This index is updated whenever data values are
modified in any way.  For scalar/ordinal values such as float, integer,
numeric it would simply store and update a statistics accumulator (a
small vector of a few items holding statistical moments, counts and
sums) for the column of interest.   These indexes would be very small
and inexpensive to {for instance} memory map.

For categorical values (things like 'color' or 'department') we might
store the count for the number of items that correspond to a hash in our
statistical index.  It would give you a crude count distinct for any
item -- the only caveat being that more than one item could possibly
have the same hash code (we would also keep a count of null items).  If
the count needed to be exact, we could generate a perfect hash for the
data or store each distinct column value in the categorical index with
its hash.  The size of such an index would depend on the data so that
bit or char='m'/'f' for male/female or 'y'/'n' for yes/no indexes would
contain just two counts and a column that is unique would have one hash
paired with the number one for each row of the table (a regular unique
index would clearly be better in such a case, but such distributions
could easily arise).  The value of an index like this is that it is good
where the normal index types are bad (e.g. it is useless to create a
btree index on a bit column or male/female, yes/no -- things of that
nature but a statistics counter would work nicely -- to give you whole
table measures only of course).  The thing that is odd about these
statistics indexes is that they do not even bother to point back to the
data that they represent -- they are just abstract measures of the
general contents.

It seems to me that this kind of information could be used to improve
the query plans for both categorical values and scalars and also be used
to generate instant answers for some kinds of statistical queries.  If
used only for query planning, the values would not need to be exact, and
could be updated only at vacuum full time or some other convenient time.
The notion behind creation of a stats index would be that we may not
need to maintain this detailed information for every column, but we can
maintain such data for columns that we often filter with in our queries
to get an idea of cardinality for a subset.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http

Re: [HACKERS] Simple postgresql.conf wizard

2008-11-19 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Robert Haas
 Sent: Wednesday, November 19, 2008 5:05 PM
 To: Bruce Momjian
 Cc: Mark Wong; Tom Lane; Heikki Linnakangas; Gregory Stark; Josh
 Berkus; Greg Smith; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Simple postgresql.conf wizard

  Yes, please test something.  I am tired if us saying we need to
 increase
  default_statistics_target, but because we don't know the magic
 number,
  we do nothing release after release.

 I think the idea that there IS a magic number is the problem.

 No amount of testing is ever going to refute the argument that, under
 some other workload, a different value might better.

 But that doesn't amount to a reason to leave it the way it is.

Perhaps a table of experimental data could serve as a rough guideline.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Installation oddity -- installer insists that PostgreSQL has failed to start, even though it is started and functioning correctly

2008-11-18 Thread Dann Corbit

Sorry about the HTML post, but I wanted to include the error form.

 

When trying to perform an upgrade with PostgreSQL 8.3.5 over 8.3.4, the
installer insists that PostgreSQL has failed to start.

 

However, I can see the processes in memory (it has indeed started) and
even attach with PG Admin III and perform queries.

 

 

 

PG Admin III confirms the version string as:

PostgreSQL 8.3.5, compiled by Visual C++ build 1400.

 

My machine has the following parameters:

 



CrystalCPUID Pure x64 Edition 4.15.2.451 (C) 2002-2008 hiyohiyo

Crystal Dew World : http://crystalmark.info/



 

 OS : Windows Server 2008 Server Standard Edition (full
installation) SP1 [ 6.0 Build 6001 ]

   Date : 2008/11/18 13:11:48

 

   CPU Name : Intel Core 2 Extreme (Yorkfield)

  Vendor String : GenuineIntel

Name String : Intel(R) Core(TM)2 Extreme CPU X9650  @ 3.00GHz

   Architecture : x64

   Process Rule : 45 nm

   Platform : LGA775 [4]

   CPU Type : Original OEM processor [0]

 Number (Total) : 4

  Physical Core : 4

 Family : 6h

Ext. Family : 6h

  Model : 7h

 Ext. Model : 17h

   Stepping : 6h

  Ext. Stepping : 6h

   Microcode ID : 60Bh

Feature : MMX SSE SSE2 SSE3 SSSE3 SSE4.1 XD VT Intel 64

PowerManagement : SpeedStep

 

CurrentOriginal

  Clock : 2992.45 MHz 3000.00 MHz

   System Clock :  332.49 MHz  333.33 MHz

 System Bus : 1329.98 MHz 1333.33 MHz

 Multiplier :9.009.00

  Data Rate :QDR

 Over Clock :   -0.25 %

 

 L1 I-Cache :   32 KB

 L1 D-Cache :   32 KB

   L2 Cache : 6144 KB [Full:2992.45 MHz]

 Memory : 4029 MB

 

   CPUID: EAX  EBX  ECX  EDX

   --

    : 000A 756E6547 6C65746E 49656E69

   0001 : 00010676 00040800 0008E3BD BFEBFBFF

   0002 : 05B0B101 005657F0  2CB4304E

   0003 :    

   0004 :    

   0005 : 0040 0040 0003 0020

   0006 : 0001 0002 0001 

   0007 :    

   0008 : 0400   

   0009 :    

   000A : 07280202   0503

   8000 : 8008   

   8001 :   0001 20100800

   8002 : 65746E49 2952286C 726F4320 4D542865

   8003 : 45203229 65727478 4320656D 58205550

   8004 : 30353639 20402020 30302E33 007A4847

   8005 :    

   8006 :   18008040 

   8007 :    

   8008 : 3024   

 

   MSR  : EAX1 EDX1 EAX2 EDX2

: 42480800  88C4091F 0C10

   QPF  : 14.318180 MHz

 



 Chipset Information



Chipset : Intel Unknown

   North Bridge : [8086:29E0.00] Intel Unknown

   South Bridge : [8086:2916.02] Intel 82801IR (ICH9R)

 Video Chip : [10DE:040E.A1] NVIDIA Unknown

 



 BIOS Information by WMI (Windows Management Interface)



BIOS Vendor : Dell Inc.

   BIOS Caption : Phoenix ROM BIOS PLUS Version 1.10 A05

   BIOS Version : DELL   - 15

  BIOS Date : 05/09/2008

SM BIOS Caption : A05

SM BIOS Version : 2.5

  Mother Vendor : Dell Inc.

 Mother Product : 0TP412

 Mother Version :

  System Vendor : Dell Inc.

   System Model : Precision WorkStation T3400  

 



 System Information by DMI (Desktop Management Interface)



SM BIOS Version : 2.5

BIOS Vendor : Dell Inc.

   BIOS Version : A05

  BIOS Date : 05/09/2008

  BIOS ROM Size : 1024 KB

  Mother Vendor : Dell Inc.

 Mother Product : 0TP412

 Mother Version :

 CPU Socket : CPU

 CPU Vendor : Intel

CPU Version : CPU

  Current Clock : 3000 MHz

 External Clock : 1333 MHz

  Max Clock : 5200 MHz

 



 PCI Device Information



No Bus Dev Fnc VendorDeviceRev Class   DeviceType

 0   0   0   0 8086h:29E0h.00h 06h Host Bridge

 1   0   1   0 8086h:29E1h.00h 060400h PCI to PCI Bridge

 2

[HACKERS] Strange query behavior where clause produces odd behavior on '' query

2008-10-30 Thread Dann Corbit

The following query:
SELECT * FROM Customers_connxstore where customerid  'specd'

Returns the row containing Customers_connxstore.customerid == 'SPECD'

I would expect to get that row if the query was:
SELECT * FROM Customers_connxstore where customerid = 'specd'

The other operators (, =, , =, !=) all work as expected.

Sample file to reproduce the problem:
http://cap.connx.com/bugreport/pgbug.sql.bz2

Is this a known issue with PostgreSQL or for some reason the desired
behavior?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange query behavior where clause produces odd behavior on '' query

2008-10-30 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 30, 2008 5:31 PM
 To: Dann Corbit
 Cc: [HACKERS]; Sherry Griffin
 Subject: Re: [HACKERS] Strange query behavior where clause produces
odd
 behavior on '' query

 Dann Corbit [EMAIL PROTECTED] writes:
  The following query:
  SELECT * FROM Customers_connxstore where customerid  'specd'
  Returns the row containing Customers_connxstore.customerid ==
'SPECD'

 What encoding/locale are you using?  

Whatever the default encoding/locale is.
I did not define any custom encoding, locale, or collating sequence.

 And while I'm asking, which PG
 version?

All versions from PostgreSQL 7.1.3 to PostgreSQL 8.3.3, compiled by
Visual C++ build 1400 seem to display this behavior.
Are you unable to reproduce it?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange query behavior where clause produces odd behavior on '' query

2008-10-30 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 30, 2008 6:34 PM
 To: Dann Corbit
 Cc: [HACKERS]; Sherry Griffin
 Subject: Re: [HACKERS] Strange query behavior where clause produces
odd
 behavior on '' query

 Dann Corbit [EMAIL PROTECTED] writes:
  What encoding/locale are you using?

  Whatever the default encoding/locale is.

 Whatever is the wrong answer here.  I just finished verifying that
 the
 sort order you're complaining about is the expected ordering in some
 locales.  I suggest that you take the trouble to find out.

English (United States) is my locale.

  Are you unable to reproduce it?

 Well, I see this on a Fedora machine:

 $ cat foo
 specd
 SPECD
 $ sort foo
 SPECD
 specd
 $ LANG=en_US sort foo
 specd
 SPECD
 $

The compare works as it should.
The only bug was in my understanding.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

2008-10-29 Thread Dann Corbit

Hi Simon,

He is going to do some investigation in the methods and
write down the possibilities and then he is going to implement
something from that for PostgreSQL.

When will this work be complete? We are days away from completing main
work on 8.4, so you won't get much discussion on this for a few months
yet. Will it be complete in time for 8.5? Or much earlier even?
 
The first guess is that the work will be done for 8.6. 
Dano is supposed to finish the work and defend his thesis in something
a bit more than a year.

Julius, you don't mention what your role is in this. In what sense is
Dano's master's thesis a we thing?
 
I am Dano's mentor and we have a closed contact with Zdenek as well. 
We would like the project to become a we thing as another reason why
to work on the project. 
It seems to be better to research some ideas at the begging and discuss
the stuff
during development than just individually writing some piece of code
which could 
be published afterwards. Especially, when this area seems to be of
interest of 
more people.

Threads are where future performance is going to come from:
General purpose-
http://www.setup32.com/hardware/cpuchipset/32core-processors-intel-reach
e.php

GPU-
http://wwwx.cs.unc.edu/~lastra/Research/GPU_performance.html
http://www.cs.unc.edu/~geom/GPUSORT/results.html

Database engines that want to exploit the ultimate in performance will
utilize multiple threads of execution.
True, the same thing can be realized by multiple processes, but a
process is more expensive than a thread.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Plugin system like Firefox

2008-08-12 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Bruce Momjian
 Sent: Tuesday, August 12, 2008 9:04 PM
 To: Robert Haas
 Cc: Markus Wanner; Matthew T. O'Connor; Tom Lane; Greg Sabino Mullane;
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Plugin system like Firefox

 Robert Haas wrote:
.. an OS-agnostic way of installing packages.
   Uh.. I don't think such a thing exists.
   Seems to in Firefox.

  I think nearly all Firefox extensions are written in XML and
  Javascript.  It's quite a bit easier to have an OS-agnostic way of
  installing packages when the packages are platform-independent.

I realize many prefer their OS-native packaging system, but
that isn't the target audience of a packaging system that will
 increase
adoption.

  Really?  Who is it?

 People who want to try a database and its plugins to see if they like
 it.

For what it's worth, I am rather fond of the firefox plugin for sqlite.

Now, SQLite has very little in the way of administrative tools, so that
tilts the field, but I guess other people might toy with PostgreSQL if a
plugin were available and might not otherwise just because one is
available.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Whence cometh the data in src/test/regress/data/streets.data ?

2008-07-25 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Friday, July 25, 2008 12:52 PM
 To: Bjorn Munch
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Whence cometh the data in
 src/test/regress/data/streets.data ?

 Bjorn Munch [EMAIL PROTECTED] writes:
  This file was checked in way back in July 1996, by Marc G. Fournier
  but that doesn't mean he was the one who got the data from
  somewhere. Does anyone know where it comes from? Or has this
  information been lost in the mist of time?

 It's in the postgres v4r2 tarball with a file date of 1993-01-08,
 which means nobody around the current project has a clue.

 IANAL either, but I think it'd be fairly hard for anyone to assert a
 copyright on it given that it's a compilation of publicly available
 facts, and is surely not in the same format the information was
 originally published in anyhow.

Probably something from here:
http://www.freegis.org/database/?cat=1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Friday, July 25, 2008 12:32 PM
 To: Ryan Bradetich
 Cc: Gregory Stark; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [RFC] Unsigned integer support.

 Ryan Bradetich [EMAIL PROTECTED] writes:
  On Fri, Jul 25, 2008 at 3:57 AM, Gregory Stark
 [EMAIL PROTECTED] wrote:
  Ryan Bradetich [EMAIL PROTECTED] writes:
  My plans for the example above would be:

  1. SELECT 15 + 15  -- Throws overflow error.
  2. SELECT 15::uint4 + 15 -- Returns
 30::uint4.

  You could make it work by having a uint4+int4 operator which
returns
 uint4 but
  then you're going to need a *lot* of operators

  This was my plan.

 Like he says, it's a *lot* of operators, and the point doesn't seem
 entirely clear to me.  You'll still have overflow cases, they'll just
 be
 in different places.

 Consider the idea of not having any uint4-specific arithmetic
 operators,
 but instead providing the following:

   * assignment casts from int4 and int8 to uint4
 (these throw error if out of range, of course)
   * implicit cast from uint4 to int8 (can never fail)

 The effect of providing the latter cast would be that any arithmetic
 involving a uint4 column would automatically be done in int8.  Which
 would make it a shade slower than a native implementation, but
probably
 not enough slower to be a problem --- and you'd avoid having to write
 dozens of operators and underlying support functions.  Storing into
the
 uint4 column would work fine with no extra notation because of the
 assignment casts.

 Moreover, you'd avoid cluttering the system with a pile of cross-type
 operators, which we have recently realized are not a good thing,
 because
 they increase the likelihood of ambiguous operator problems --- see
 http://archives.postgresql.org/pgsql-hackers/2008-06/msg00750.php

 For uint8 you'd have to promote to numeric to guarantee no failure
 in the implicit cast; which is going to be a rather bigger performance
 hit, but I don't really see uint8 as being a type with huge demand.

 Now you probably *will* want cross-type comparison operators, if you
 are going to support indexing of unsigned columns, so that something
 like
   uint4col  42
 can be indexed without any casting.  But limiting yourself to the six
 basic comparison operators certainly makes it a much less bulky
 project.

At the cost of one bit of storage, you have compatible types using
CREATE DOMAIN:

CREATE DOMAIN name [ AS ] data_type
[ DEFAULT expression ]
[ constraint [ ... ] ]

where constraint is:

[ CONSTRAINT constraint_name ]
{ NOT NULL | NULL | CHECK (expression) }

More specifically:

CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE  0);
CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE  0);
CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE  0);
CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE  0);

Seems like a heck of a lot less work to me.  Not to mention very easy to
use.

C:\Program Files (x86)\PostgreSQL\8.3\binpsql -h localhost -U postgres
domaintest
Password for user postgres:
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

Warning: Console code page (437) differs from Windows code page (1252)
 8-bit characters might not work correctly. See psql reference
 page Notes for Windows users for details.

domaintest=# CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE  0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE  0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE  0);
CREATE DOMAIN
domaintest=# CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE  0);
CREATE DOMAIN
domaintest=#
domaintest=# create table integer_types (
domaintest(# usCol usmallint,
domaintest(# sCol smallint,
domaintest(# uiCol uinteger,
domaintest(# iCol integer,
domaintest(# ubCol ubigint,
domaintest(# bCol bigint,
domaintest(# unCol unumeric,
domaintest(# nCol numeric
domaintest(# );
CREATE TABLE
domaintest=# create index i1 on integer_types(usCol);
CREATE INDEX
domaintest=# create index i2 on integer_types(sCol);
CREATE INDEX
domaintest=# create index i3 on integer_types(uiCol);
CREATE INDEX
domaintest=# create index i4 on integer_types(iCol);
CREATE INDEX
domaintest=# create index i5 on integer_types(ubCol);
CREATE INDEX
domaintest=# create index i6 on integer_types(bCol);
CREATE INDEX
domaintest=# create index i7 on integer_types(unCol);
CREATE INDEX
domaintest=# create index i8 on integer_types(nCol);
CREATE INDEX
domaintest=# insert into integer_types values(1,1,1,1,1,1,1,1);
INSERT 0 1
domaintest=# select * from integer_types;
 uscol | scol | uicol | icol | ubcol | bcol | uncol | ncol

Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Friday, July 25, 2008 1:11 PM
 To: Dann Corbit
 Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] [RFC] Unsigned integer support.
 
 
 
 Dann Corbit wrote:
 
  CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE  0);
  CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE  0);
  CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE  0);
  CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE  0);
 
 
 
 
 s//=/g

I turned off the default option to remove extra line breaks.  
Future posts should not be quite as even and bletcherous.
God willing, and the crick don't rise.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] Unsigned integer support.

2008-07-25 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Friday, July 25, 2008 1:28 PM
 To: Dann Corbit
 Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] [RFC] Unsigned integer support.
 
 
 
 Dann Corbit wrote:
  -Original Message-
  From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
  Sent: Friday, July 25, 2008 1:11 PM
  To: Dann Corbit
  Cc: Tom Lane; Ryan Bradetich; Gregory Stark; pgsql-
  [EMAIL PROTECTED]
  Subject: Re: [HACKERS] [RFC] Unsigned integer support.
 
 
 
  Dann Corbit wrote:
 
  CREATE DOMAIN usmallint AS SMALLINT CHECK(VALUE  0);
  CREATE DOMAIN uinteger AS INTEGER CHECK(VALUE  0);
  CREATE DOMAIN ubigint AS BIGINT CHECK(VALUE  0);
  CREATE DOMAIN unumeric AS NUMERIC CHECK(VALUE  0);
 
 
 
 
  s//=/g
 
 
  I turned off the default option to remove extra line breaks.
  Future posts should not be quite as even and bletcherous.
  God willing, and the crick don't rise.
 
 
 
 I suspect you're missing my point, namely that 0 should be an allowed
 value for unsigned types.

Quite right.  The domains I created were really the 'natural numbers'
rather than unsigned types.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Dann Corbit

When you install the source tree (e.g. in folder \postgresql-8.3.x) you
will want to examine nodeMergejoin.c typically found in a path similar
to this:

\postgresql-8.3.x\src\backend\executor\nodeMergejoin.c

 

Here are the comments from the version on my machine:

 

/*

 * INTERFACE ROUTINES

 *ExecMergeJoin
mergejoin outer and inner relations.

 *ExecInitMergeJoin
creates and initializes run time states

 *ExecEndMergeJoin
cleans up the node.

 *

 * NOTES

 *

 *Merge-join is done by joining the inner
and outer tuples satisfying

 *join clauses of the form ((= outerKey
innerKey) ...).

 *The join clause list is provided by the
query planner and may contain

 *more than one (= outerKey innerKey) clause
(for composite sort key).

 *

 *However, the query executor needs to know
whether an outer

 *tuple is greater/smaller than an inner
tuple so that it can

 *synchronize the two relations. For
example, consider the following

 *relations:

 *

 *outer: (0
^1 1 2 5 5 5 6 6 7)current tuple: 1

 *inner: (1
^3 5 5 5 5 6)   current tuple: 3

 *

 *To continue the merge-join, the executor
needs to scan both inner

 *and outer relations till the matching
tuples 5. It needs to know

 *that currently inner tuple 3 is greater
than outer tuple 1 and

 *therefore it should scan the outer
relation first to find a

 *matching tuple and so on.

 *

 *Therefore, rather than directly executing
the merge join clauses,

 *we evaluate the left and right key
expressions separately and then

 *compare the columns one at a time (see
MJCompare).  The planner

 *passes us enough information about the
sort ordering of the inputs

 *to allow us to determine how to make the
comparison.  We may use the

 *appropriate btree comparison function,
since Postgres' only notion

 *of ordering is specified by btree
opfamilies.

 *

 *

 *Consider the above relations and suppose
that the executor has

 *just joined the first outer 5 with the
last inner 5. The

 *next step is of course to join the second
outer 5 with all

 *the inner 5's. This requires
repositioning the inner cursor

 *to point at the first inner 5. This is
done by marking the

 *first inner 5 so we can restore the
cursor to it before joining

 *with the second outer 5. The access method
interface provides

 *routines to mark and restore to a tuple.

 *

 *

 *Essential operation of the merge join
algorithm is as follows:

 *

 *Join {

 *get initial outer and
inner tuples
INITIALIZE

 *do forever {

 *while
(outer != inner) {
SKIP_TEST

 *
if (outer  inner)

 *
advance outer
SKIPOUTER_ADVANCE

 *
else

 *
advance inner
SKIPINNER_ADVANCE

 *}

 *mark inner
position
SKIP_TEST

 *do forever
{

 *
while (outer == inner) {

 *
join tuples
JOINTUPLES

 *
advance inner position
NEXTINNER

 *
}

 *
advance outer position
NEXTOUTER

 *
if (outer == mark)
TESTOUTER

 *
restore inner position to mark   TESTOUTER

 *
else

 *
break// return to top of outer loop

 *}

 *}

 *}

 *

 *The merge join operation is coded in the
fashion

 *of a state machine.  At each state, we do
something and then

 *proceed to another state.  This state is
stored in the node's

 *execution state information and is
preserved across calls to

 *ExecMergeJoin. -cim 10/31/89

 */

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL

Re: [HACKERS] Research/Implementation of Nested Loop Join optimization

2008-07-23 Thread Dann Corbit

From: Manoel Henrique [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 23, 2008 1:47 PM
To: Dann Corbit
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Research/Implementation of Nested Loop Join
optimization

The nodeMergejoin.c is the code for the Merge Join isn`t it? I am trying
to find a way to change the Nested Loop Join, It would be more like on
nodeNestloop.c when rescanning the inner plan, (second time scanning the
inner plan and so on) he`d change the scan direction, If the scan
direction was from first tuple to last tuple it would go backwards, if
it was from last to first it would go forward... The code I`m looking
atm is from 8.3.1 , seems to have some kind of direction manager but
doesn`t seems to be in use.

You are right.  You want file: nodeNestloop.c

Re: [HACKERS] [PATCH]-hash index improving

2008-07-22 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Xiao Meng
 Sent: Tuesday, July 22, 2008 7:57 PM
 To: Simon Riggs
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [PATCH]-hash index improving

 Well, I'll do it after I finish my second patch.
 Hash index should be more efficient than btree when N is big enough.
 It seems meaningful to find how big N is in an experiment way.

The savings will depend on many factors.  Another thing (besides volume) which 
is important is the sort of key data being indexed.

Consider a unique key on six varchar(40) fields:

1.  Country
2.  State/Province
3.  City
4.  Division
5.  Branch
6.  Office

Typically, a key such as this will have lots of clumps of similar data, only 
being differentiated with the final segment.  This sort of index is often used 
for reporting purposes.  To determine a unique entry, it is not unlikely that 
more than 200 characters will be traversed.  A hash index gets a special boost 
here because a much smaller data signature is stored.  Even a 64 bit hash 
occupies only 8 bytes.  On the other hand, consider an index on a field 
consisting of a single character.  Here, the pages of the b-tree will have a 
huge volume of entries per page, requiring fewer pages to search, and the hash 
index is many times larger and hence more pages will have to be loaded.

These things make a hash index desirable:
1. Unique index
2. Long keys
3. Huge data cardinality
4. Equality search

These things make a hash index undesirable:
1.  Non-unique index
2.  Short keys
3.  Small data sets

These things render a hash index as worthless (except in COMBINATION with a 
b-tree type index):
1.  Need for range searches like BETWEEN
2.  Need for ORDER BY on the column(s)

As an aside:
I guess it will also be nice if you can CLUSTER both index and data values on 
the hash.  It may need a different algorithm than a b-tree clustering concept.  
I know that Rdb uses different kinds of storage areas for hashed indexes verses 
b-tree indexes.

This effort to create hashed indexes is very valuable.  Because it becomes more 
and more dominant as the data scales up, right at the point when things get 
tougher is when it becomes the most helpful.  If you have a tiny table, it does 
not even matter if you index it, because (for instance) 10 rows will probably 
always stay in memory and iteration will find what is needed instantly.  But if 
you have hundreds of millions of rows or billions of rows, now is when 
performance really matters.  So when the data scales to preposterous size 
(which it has an uncanny ability to do) the boost of performance becomes even 
more valuable.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]-hash index improving

2008-07-17 Thread Dann Corbit

-Original Message-

 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Simon Riggs
 Sent: Thursday, July 17, 2008 4:10 PM

 To: Jonah H. Harris

 Cc: Xiao Meng; pgsql-hackers@postgresql.org; Kenneth Marshall

 Subject: Re: [HACKERS] [PATCH]-hash index improving

 On Thu, 2008-07-17 at 16:24 -0400, Jonah H. Harris wrote:

  I'm not really seeing any performance improvements over b-tree.

 I'd like to see a theoretical analysis on the benefit case before we

 spend too many teraflops on investigation.

 In which cases do we expect that hash indexes will beat btrees?

Large table unique index equality search should be very fast with hashed
index (and the only place where any advantage will be seen).  Hashed
indexes are useless for any search besides equality and gain more and
more when the levels of the b-tree index increase.

Here is a hash index lookup that is frightfully fast:
http://www.corpit.ru/mjt/tinycdb.html

It's a constant database, but the file format might be worth
examination.  Here is a quickie definition of the CDB format:
http://cr.yp.to/cdb/cdb.txt

I think that the problem with hashed indexes tends to be the blocking of
index pages on disk.  For instance, the FastDB/GigaBase author was able
to make FastDB memory based hash indexes that are faster than the memory
based b-tree versions, but not for the disk based GigaBase database.
The only way to get better performance from hash based indexes is to
read fewer index pages than if a tree-based index were used.  So I think
that the scheme used to create the index pages is the focus to make them
worthwhile.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TIMESTAMP and daylight savings time question

2008-03-14 Thread Dann Corbit

We figured the problem out.
Our older version did not have the OS patch:
http://www.postgresqlforums.com/wiki/2007_Daylight_Savings_Time

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dann Corbit
Sent: Wednesday, March 12, 2008 6:06 PM
To: pgsql-hackers@postgresql.org
Cc: Larry McGhaw
Subject: [HACKERS] TIMESTAMP and daylight savings time question

It appears that the older versions of PostgreSQL (7.x) do not consider
the daylight savings time when using TIMESTAMP WITH TIMEZONE but the
most recent versions do (8.x).
Can someone tell me the exact PostgreSQL version number where the
behavior changed?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TIMESTAMP and daylight savings time question

2008-03-12 Thread Dann Corbit

It appears that the older versions of PostgreSQL (7.x) do not consider
the daylight savings time when using TIMESTAMP WITH TIMEZONE but the
most recent versions do (8.x).

Can someone tell me the exact PostgreSQL version number where the
behavior changed?

Re: [HACKERS] bug in numeric_power() function

2008-03-11 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Richard Wang
 Sent: Tuesday, March 11, 2008 7:38 PM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] bug in numeric_power() function
 
 I expected 0 ^ 123.3 to be 0, but it reported error as follows
 
 postgres=# select 0 ^ 123.3;
 ERROR:  cannot take logarithm of zero
 
 I find that there is a bug in numeric_power() function
 the function caculates a ^ b based on the algorithm e ^ (lna * b)
 as you see, ln0 is not valid

It seems an obvious work-around that:
if (b == 0) return 1;
if (a == 0) return 0;
could be inserted at the top.

Aside:
Having the ^ operator overloaded for exponentiation surprises me.

Does it really have the right precedence for performing exponentiation?
(E.g. you cannot do this in C++ because ^ will have the precedence of
xor, which is wrong).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Will PostgreSQL get ported to CUDA?

2008-01-30 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Christopher Browne
 Sent: Wednesday, January 30, 2008 9:56 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Will PostgreSQL get ported to CUDA?

 2008/1/30 Dann Corbit [EMAIL PROTECTED]:

http://www.scientificcomputing.com/ShowPR~PUBCODE~030~ACCT~300100~IS
SU
 E~0801~RELTYPE~HPCC~PRODCODE~~PRODLETT~C.html

  http://www.nvidia.com/object/cuda_learn.html

  http://www.nvidia.com/object/cuda_get.html

 Someone at CMU has tried this, somewhat fruitfully.

 http://www.andrew.cmu.edu/user/ngm/15-823/project/Draft.pdf
 http://www.andrew.cmu.edu/user/ngm/15-823/project/Final.pdf

 This was based on GPUSort:
 http://gamma.cs.unc.edu/GPUSORT/

 Unfortunately, the licensing of GPUSort is, if anything, more awful
 than that for CUDA.
 http://gamma.cs.unc.edu/GPUSORT/terms.html

 This would need to get pretty totally reimplemented to be useful with
 PostgreSQL.  Happily, we actually have some evidence that the exercise
 would be of some value.  Further, it looks to me like the
 implementation that was done was done in a pretty naive way.
 Something done more seriously would likely be radically better...

It's too bad that they have a restrictive license.

Perhaps there is an opportunity to create an information appliance
that contains a special build of PostgreSQL, a nice heap of super-speedy
disk, and a big pile of GPUs for sort and merge type operations.  The
thing that seems nice to me about this idea is that you would have a
very stable test platform (all hardware and software combinations would
be known and thoroughly tested) and you might also get some extreme
performance.

I guess that a better sort than GPUSort could be written from scratch,
but legal entanglements with the use of the graphics cards may make the
whole concept DOA.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Will PostgreSQL get ported to CUDA?

2008-01-29 Thread Dann Corbit

http://www.scientificcomputing.com/ShowPR~PUBCODE~030~ACCT~300100~IS
SUE~0801~RELTYPE~HPCC~PRODCODE~~PRODLETT~C.html

http://www.nvidia.com/object/cuda_learn.html

http://www.nvidia.com/object/cuda_get.html

Re: [HACKERS] operator suggest interval / interval = numeric

2008-01-09 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Wednesday, January 09, 2008 10:00 PM
 To: Warren Turkal
 Cc: Brendan Jurd; Ilya А. Кovalenko; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] operator suggest  interval / interval = numeric

 Warren Turkal [EMAIL PROTECTED] writes:
  YEAR TO MONTH and DAY TO {HOUR,MINUTE,SECOND} intervals should not
  combine. PostgreSQL correctly doesn't allow {YEAR,MONTH} TO
  {DAY,HOUR,MINUTE,SECOND} intervals,

 Really?  I think you've confused some unimplemented decorative syntax
 with what the underlying datatype will or won't do.

  This is inconsistent. I would like to ultimately not allow operations
  on interval combinations that are not allowed by the SQL standard.

 The spec's approach to datetime operations in general is almost totally
 brain-dead, and so you won't find a lot of support around here for hewing
 to the straight-and-narrow-spec-compliance approach.  If they have not
 even heard of daylight-savings time, how can anyone credit them with any
 meaningful contact with the real world?  We'll cite the spec where it
 suits us, but in this area the spec says you can't do that carries
 very little weight.

 Or were you planning to lobby for removal of our DST support, too?

Don't forget indexes.  The standard does not breathe a word about them.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-20 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Brian Hurt
 Sent: Thursday, December 20, 2007 6:42 AM
 To: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Sorting Improvements for 8.4

 While we're blue skying things, I've had an idea for a sorting
algorithm
 kicking around for a couple of years that might be interesting.  It's
a
 variation on heapsort to make it significantly more block-friendly.  I
 have no idea if the idea would work, or how well it'd work, but it
might
 be worthwhile kicking around.

 Now, the core idea of heapsort is that the array is put into heap
order-
 basically, that a[i] = a[2i+1] and a[i] = a[2i+2] (doing the 0-based
 array version here).  The problem is that, assuming that the length of
a
 is larger than memory, then a[2i+1] is likely going to be on a
different
 page or block than a[i].  That means every time you have to bubble
down
 a new element, you end up reading O(log N) blocks- this is *per
element*.

 The variation is to instead work with blocks, so you have a block of
 entries b[i], and you change the definition of heap order, so that
 min(b[i]) = max(b[2i+1]) and min(b[i]) = max(b[2i+2]).  Also, during
 bubble down, you need to be carefull to only change the minimum value
of
 one of the two child blocks b[2i+1] and b[2i+2].  Other than that, the
 algorithm works as normal.  The advantage of doing it this way is that
 while each bubble down still takes O(log N) blocks being touched, you
 get a entire block worth of results for your effort.  Make your blocks
 large enough (say, 1/4 the size of workmem) and you greatly reduce N,
 the number of blocks you have to deal with, and get much better I/O
 (when you're reading, you're reading megabytes at a shot).

 Now, there are boatloads of complexities I'm glossing over here.  This
 is more of a sketch of the idea.  But it's something to consider.

It's an interesting idea to work with a heap of heaps where you try to
keep each heap page-sized.  It reminds me of the B+ tree, where you
collect a whole bunch of nodes into a single page.

I don't know if you have examined weak-heaps, but there are some
interesting results for weak-heap approaches.  As you know, heapsort
variants do not degenerate to O(N^2).

On this link:
http://www.jea.acm.org/2002/EdelkampHeapsort/

I highly recommend all the goodies he has embedded (papers, source,
etc.)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-19 Thread Dann Corbit

As long as sorting improvements are being considered, may I suggest an
experiment that uses a very simple model?

Assuming that you have K subfiles created by the initial sorting pass,
insert the top record of each file into a priority queue.

Then, emit records from the queue until the priority queue is empty.

Now, there will be the objection that we will be jumping willy-nilly all
over the disk because of reading one record at a time, but (depending on
how it is implemented) generally several records are buffered during a
read.

So (as a gentle suggestion) I suggest testing the model.  It works great
for a single CPU or multiple CPU system for the work that *I* do.  I
have no idea if it will be a benefit for PostgreSQL or not, but it
should be a very simple matter to try it.  As long as someone is doing
the work right now, it would be a good time to give it a go.

I am not very familiar with PostgreSQL internals, but I would be willing
to give a hand with it (not really sure how much time I can guarantee,
though, since I would be doing it on my free time).


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-19 Thread Dann Corbit

 -Original Message-
 From: Jeff Davis [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 19, 2007 3:10 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Sorting Improvements for 8.4

 On Wed, 2007-12-19 at 14:41 -0800, Dann Corbit wrote:
  As long as sorting improvements are being considered, may I suggest
an
  experiment that uses a very simple model?

  Assuming that you have K subfiles created by the initial sorting
pass,
  insert the top record of each file into a priority queue.

  Then, emit records from the queue until the priority queue is empty.

 What is the principle difference between that idea and our existing
sort
 algorithm?

 There's a good explanation in the comment at the top of tuplesort.c.

According to the comments, PostgreSQL uses replacement selection.
Replacement selection is a wonderful thing because it creates runs that
are twice as long as normal due to the snowplow effect.  See (for
instance):
http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/69/27216/01209012.
pdf

Then, the merge routine will have half as many runs to merge the files
together.

So (for instance) without replacement selection, if you create 1024
subfiles, then replacement selection will create 512.  That saves one
merge pass.

The algorithm that I am suggesting will take exactly one pass to merge
all of the files.

It works like this...

Imagine an array of pointers to the subfiles:
[*subfile][*subfile]...[*subfile]

Step 0:
We sort the array by a comparison operator that examines the top element
of each subfile.  So now the array is ordered such that the record with
the smallest key is in array slot 0.

Step 1:
We remove the first record from the subfile in array slot 0.  Now, the
priority of the first element *may* have changed.  So if it is no longer
smaller than the subfile immediately to the right, we do a binary
insertion to put this subfile in its new location, moving the contents
of array slot[1] to array slot 0 if it is needed.

Step 2:
Is the entire list of subfiles empty?  If yes, then terminate, if no
then go to Step 1.

Like I said, it is ultra-simple and it sorts the entire contents of all
subfiles to the output with a single pass.

Consider the way that current replacement selection works.  The actual
O(f(N)) behavior of replacement selection is just terrible O(n^2).  But
because we save one full merge pass, it is usually worth it anyway,
since memory access is much faster than disk.  And if we only have a few
subfiles, the savings will be large.  In the case of a priority queue
merge, we only have one single merge pass no matter how many subfiles
there are.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-19 Thread Dann Corbit

P.S.
A beautiful paper on replacement selection is found here:
http://students.fim.uni-passau.de/~fickensc/Proseminar/Proseminar.pdf


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Sorting Improvements for 8.4

2007-12-19 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Gregory Stark
 Sent: Wednesday, December 19, 2007 5:26 PM
 To: Jeff Davis
 Cc: Mark Mielke; Michał Zaborowski; Simon Riggs; Ron Mayer; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] Sorting Improvements for 8.4
 
 
 Jeff Davis [EMAIL PROTECTED] writes:
 
  test= explain analyze select * from sorter order by t;
  test= explain analyze select * from sorter order by b;
  test= explain analyze select * from sorter order by f;
 
  On my machine this table fits easily in memory (so there aren't any disk
  reads at all). Sorting takes 7 seconds for floats, 9 seconds for binary
  data, and 20 seconds for localized text. That's much longer than it
  would take to read that data from disk, since it's only 70MB (which
  takes a fraction of a second on my machine).
 
  I think this disproves your hypothesis that sorting happens at disk
  speed.
 
 I suspect most of that is spent just copying the data around. Which would
 not
 be helped by having multiple threads doing the copying -- and in fact
 might be
 exacerbated if it required an extra copy to consolidate all the data in
 the
 end.

Benchmarking a single system will really only explain that system.
Someone may have a disk farm with 2GB/Sec throughput:
http://www.sql-server-performance.com/articles/per/system_storage_configuration_p1.aspx

But such a configuration is very unlikely.

Someone may have 10GB/S NIC cards, but those too, are rare.

So for any benchmark, we will really just end up with a number for that system.

Typically, disk is the bottleneck.
I found this on the net somewhere, but it's quite a useful table for capacity 
planning (to find the weak link in the chain using back of the envelope 
calculations):

Interface   Width   Frequency   Bytes/Sec   Bits/Sec
4-way interleaved PC1600 (DDR200) SDRAM 4 x 64bits  100 MHz DDR 6.4 
GB/s51 Gbps
Opteron HyperTransport memory bus   128bits 200 MHz DDR 6.4 GB/s
51 Gbps
Pentium 4 800 MHz FSB 64bits  200 MHz QDR 6.4 GB/s51 Gbps
PC2 6400 (DDR-II 800) SDRAM 64bits  400 MHz DDR 6.4 GB/s51 Gbps
PC2 5300 (DDR-II 667) SDRAM 64bits  333 MHz DDR 5.3 GB/s43 Gbps
Pentium 4 533 MHz FSB 64bits  133 MHz QDR 4.3 GB/s34 Gbps
PC2 4300 (DDR-II 533) SDRAM 64bits  266 MHz DDR 4.3 GB/s34 Gbps
2-channel PC1066 RDRAM  2 x 16bits  533 MHz DDR 4.3 GB/s34 Gbps
PCI-X 533   64bits  533 MHz 4.3 GB/s34 Gbps
PCI-Express x16 serial/16lanes  2.5 GHz 4 GB/s  32 Gbps
Pentium 4 400 MHz FSB 64bits  100 MHz QDR 3.2 GB/s25.6 Gbps
2-channel PC800 RDRAM   2 x 16bits  400 MHz DDR 3.2 GB/s25.6 
Gbps
2-way interleaved PC1600 (DDR200) SDRAM 2 x 64bits  100 MHz DDR 3.2 
GB/s25.6 Gbps
PC2 3200 (DDR-II 400) SDRAM 64bits  200 MHz DDR 3.2 GB/s25.6 
Gbps
PC3200 (DDR400) SDRAM   64bits  200 MHz DDR 3.2 GB/s25.6 Gbps
PC2700 (DDR333) SDRAM   64bits  167 MHz DDR 2.7 GB/s21 Gbps
PC2100 (DDR266) SDRAM   64bits  133 MHz DDR 2.1 GB/s17 Gbps
AGP 8x  32bits  533 MHz 2.1 GB/s17 Gbps
PCI-X 266   64bits  266 MHz 2.1 GB/s17 Gbps
PCI-Express x8  serial/8lanes   2.5 GHz 2 GB/s  16 Gbps
EV6 bus (Athlon/Duron FSB)  64bits  100 MHz DDR 1.6 GB/s13 Gbps
PC1600 (DDR200) SDRAM   64bits  100 MHz DDR 1.6 GB/s13 Gbps
PC800 RDRAM 16bits  400 MHz DDR 1.6 GB/s13 Gbps
PC150 SDRAM 64bits  150 MHz 1.3 GB/s10.2 Gbps
10 gigabit ethernet serial  10 GHz  1.25 GB/s   10 Gbps
OC-192  serial  9.953 GHz   1.24 GB/s   9.953 Gbps
133 MHz FSB 64bits  133 MHz 1.06 GB/s   8.5 Gbps
PC133 SDRAM 64bits  133 MHz 1.06 GB/s   8.5 Gbps
AGP 4x  32bits  266 MHz 1.06 GB/s   8.5 Gbps
PCI-X   64bits  133 MHz 1.06 GB/s   8.5 Gbps
PCI-Express x4  serial/4lanes   2.5 GHz 1 GB/s  8 Gbps
100 MHz FSB 64bits  100 MHz 800 MB/s6.4 Gbps
PC100 SDRAM 64bits  100 MHz 800 MB/s6.4 Gbps
PC66 SDRAM  64bits  66 MHz  533 MB/s4.3 Gbps
fast/wide PCI   64bits  66 MHz  533 MB/s4.3 Gbps
AGP 2x  32bits  133 MHz 533 MB/s4.3 Gbps
single-link DVI 12bits  165 MHz DDR 495 MB/s3.96 Gbps
Ultra-320 SCSI  16bits  160 MHz 320 MB/s2.6 Gbps
OC-48 network   serial  2.488 GHz   311 MB/s2.488 Gbps
AGP 32bits  66 MHz  266 MB/s2.1 Gbps
PCI-Express x1  serial  2.5 GHz 250 MB/s2 Gbps
Serial ATA/1500 diskserial  1.5 GHz 187 MB/s1.5 Gbps
Ultra-160 SCSI  16bits  80 MHz  160 MB/s1.3 Gbps
OC-24 network   serial  1.244 GHz   155 MB/s1.244 Gbps
PCI 32bits  33 MHz  133 MB/s1.06 Gbps
ATA/133 disk8bits   66 MHz DDR  133 MB/s1.06 Gbps
gigabit ethernetserial  1 GHz   125 MB/s1 Gbps
ATA/100 disk8bits   50 MHz DDR  100 MB/s

Re: [HACKERS] Current code for function j2date does not have the same correct dynamic range as older code.

2007-12-12 Thread Dann Corbit

From: Dann Corbit
Sent: Tuesday, December 11, 2007 5:58 PM
To: pgsql-hackers@postgresql.org
Subject: Current code for function j2date does not have the
same correct dynamic range as older code.

 It may not matter to the PostgreSQL group, since nothing goes
 wrong until the year is 1,465,002 or larger.  It may also be
 an artifact of the Microsoft Visual C++ compiler.  At any
 rate, the modular math to compute the year month and day do
 not remain accurate nearly as long for the new code as the
 old.  The old code here is j2dateOld(), and the new code is
 j2dateNew3().  I made a few tests with other internal types to
 try to extend the range of the new code, but neither long long
 or using mostly unsigned seems to restore the old range
 because the mathematics are not identical.  At any rate, here
 is a unit test driver you can fiddle with, if you so choose.

Correction:
It is the new routines that are sound.  The old routines had overflow
problems.

So *cough* nevermind.
[snip of code]

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Dumb question about binary cursors and #ifdef HAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit

If I create a binary cursor on a recent version of PostgreSQL, how can I
tell if the timestamp data internally is an 8 byte double or an 8 byte
integer?

I see an #ifdef that changes the code path to compute timestamps as one
type or the other, but I do not know how to recognize the internal
format of the type that will be returned in a binary cursor.

How can I do that?

Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit

 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 11, 2007 1:11 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED]
 Subject: Re: [GENERAL] Dumb question about binary cursors and
 #ifdefHAVE_INT64_TIMESTAMP

 Dann Corbit wrote:
  If I create a binary cursor on a recent version of PostgreSQL, how
can I
  tell if the timestamp data internally is an 8 byte double or an 8
byte
  integer?

  I see an #ifdef that changes the code path to compute timestamps as
one
  type or the other, but I do not know how to recognize the internal
  format of the type that will be returned in a binary cursor.

  How can I do that?

 SHOW integer_timestamp;

 (actually, IIRC, this is one of the params that the server will send
you
 at session start).

I guess that I am supposed to check for error on the statement?  What
does it look like when the query works?

This is what I get against PostgreSQL 8.2.5 using PG Admin III query
tool:

ERROR:  unrecognized configuration parameter integer_timestamp

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [GENERAL] Dumb question about binary cursors and #ifdefHAVE_INT64_TIMESTAMP

2007-12-11 Thread Dann Corbit

 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, December 11, 2007 1:11 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; [EMAIL PROTECTED]
 Subject: Re: [GENERAL] Dumb question about binary cursors and
 #ifdefHAVE_INT64_TIMESTAMP

 Dann Corbit wrote:
  If I create a binary cursor on a recent version of PostgreSQL, how
can I
  tell if the timestamp data internally is an 8 byte double or an 8
byte
  integer?

  I see an #ifdef that changes the code path to compute timestamps as
one
  type or the other, but I do not know how to recognize the internal
  format of the type that will be returned in a binary cursor.

  How can I do that?

 SHOW integer_timestamp;

 (actually, IIRC, this is one of the params that the server will send
you
 at session start).

Tom's post clued me in.
It's:
show integer_datetimes;

Or (in my case):
PQparameterStatus(conn, integer_datetimes)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Selecting a constant question: A summary

2007-06-12 Thread Dann Corbit

First a comment:

At CONNX Solutions Inc., we believe sincerely that we should do whatever
is necessary to make our customers prosper.  This means creation of
excellent tools and being responsive to customer needs.  Secondly, we
believe that we should treat the customers the way that we want to be
treated.

I think that the PostgreSQL group has managed the first objective, but
not the second.  Of course, that is only an opinion, but I think that
success hinges on both factors.  Our objective in this issue has also
been to improve PostgreSQL so that it can become more useful to the end
users and not to denigrate the work of the engineers that have toiled on
it.  I will also admit that frustration has caused our tone to become
sharp at times.  This is clearly a mistake on our part and for this, I
apologize.

 

Next, the problem:

According to SQL/CLI and ODBC 3.5, we should bind the length of a
character column.

 

Here are some references from the relevant documentation (SQL/CLI and
ODBC are clones of one another):


==

ANSI/ISO/IEC 9075-3-1999

for Information Technology

Database Language SQL 

Part 3: Call-Level Interface (SQL/CLI)

Section 6.5 BindCol

 

Along with function SQLBindCol from the ODBC specification

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/ht
m/odbcsqlbindcol.asp

 

 

This bit should be sufficient to explain what we are after:

BufferLength [Input]

Length of the *TargetValuePtr buffer in bytes.

 

The driver uses BufferLength to avoid writing past the end of the
*TargetValuePtr buffer when returning variable-length data, such as
character or binary data. Note that the driver counts the
null-termination character when returning character data to
*TargetValuePtr. *TargetValuePtr must therefore contain space for the
null-termination character or the driver will truncate the data.

When the driver returns fixed-length data, such as an integer or a date
structure, the driver ignores BufferLength and assumes the buffer is
large enough to hold the data. It is therefore important for the
application to allocate a large enough buffer for fixed-length data or
the driver will write past the end of the buffer.

 

SQLBindCol returns SQLSTATE HY090 (Invalid string or buffer length) when
BufferLength is less than 0 but not when BufferLength is 0. However, if
TargetType specifies a character type, an application should not set
BufferLength to 0, because ISO CLI-compliant drivers return SQLSTATE
HY090 (Invalid string or buffer length) in that case.


==

 

Now, there are times when (according to the spec) we have to defer
binding.  However, this causes great problems for end user tools and
should only be done in what is basically a dire emergency.

In the case of a SELECT query that selects a fixed constant of any sort,
it would be a definite improvement for PostgreSQL to give some sort of
upper maximum.

For example:

 

SELECT Eastern Division, sum(Inventory_level),
sum(Inventory_backorder), Manager_last_name FROM table_name WHERE
division_id = 9 GROUP BY Manager_last_name

 

Will return 3 columns of data.  The first column is of unknown length.
Imagine if you are a spreadsheet in OpenOffice:

http://www.openoffice.org/

which happens to support ODBC connections.  You would like to fill out a
report for the president of your company.  Unfortunately, the first
column is of unknown length

 

That makes it a bit difficult to format this spreadsheet.

 

Now, I will admit that we may not know a-priori if Eastern Division is
character or Unicode or MBCS.  But in the worst case scenario it will be
(16 + 1) * element_width bytes in length.  For some Unicode character
sets, element_width can be as much as 4, so that leaves 68 octets as an
upper possible maximum.

Now, you might protest, 68 bytes might be much too large.  That is true,
but I know that if I allocate 68 bytes we will not have data truncation.
It is no worse than a varchar(255) field that has a largest item 15
characters wide in it.  The grid will successfully bind and we will be
able to produce the report.

Generally speaking, grids are smart enough to automatically resize
themselves to max_length(grid_column_title, grid_column_data) and so the
report will look very nice.

 

It is also true that it is possible for us to work around the problem.
We certainly can know the exact type information about the constants in
our queries and reformat the PostgreSQL queries to decorate them with
things like:

SELECT Eastern Division::char(16),
sum(Inventory_level)::Numeric(16,4), sum(Inventory_backorder)
::Numeric(16,4), Manager_last_name FROM table_name WHERE division_id =
9 GROUP BY Manager_last_name

 

But it would be very nice if the database could provide a good estimate
for us so that PostgreSQL could work like all of the other database
systems.  Code full of

Re: [HACKERS] Selecting a constant question

2007-06-12 Thread Dann Corbit

 -Original Message-
 From: Larry McGhaw
 Sent: Tuesday, June 12, 2007 1:40 PM
 To: Martijn van Oosterhout
 Cc: Andrew Dunstan; Hannu Krosing; Tom Lane; Alvaro Herrera; Dann
Corbit;
 Gregory Stark; pgsql-hackers@postgresql.org
 Subject: RE: [HACKERS] Selecting a constant question

 For what its worth .. Your statement about why we are the first people
to
 mention this problem really got me thinking.  Anyone who would attempt
to
 write an ODBC driver for Postgres would run into the exact same issue.
 So I installed the official Postgres ODBC driver, and ran the
identical
 query and here are my results:

 I probably should have looked at this first  There is a whole
Postgres
 ODBC dialog dedicated to the very subject of this thread:
 Handling of unknown data sizes.   The pgodbc driver is configured to
 treat unknowns as varchar(255) by default,
 As shown by my example below.  This can be configured up or down as
 desired.

 SQLExecDirect:
 In: hstmt = 0x003C18E0, szSqlStr = Select a,b,c, '123' ,
'123'::char(3),
 '123'::varchar(3) from..., cbSqlStr = -3
 Return:   SQL_SUCCESS=0

 Describe Column All:
 icol, szColName, *pcbColName, *pfSqlType, *pcbColDef, *pibScale,
 *pfNullable
 1, a, 1, SQL_VARCHAR=12, 20, 0, SQL_NULLABLE=1
 2, b, 1, SQL_CHAR=1, 10, 0, SQL_NULLABLE=1
 3, c, 1, SQL_INTEGER=4, 10, 0, SQL_NULLABLE=1
 4, ?column?, 8, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1
 5, bpchar, 6, SQL_CHAR=1, 3, 0, SQL_NULLABLE=1
 6, varchar, 7, SQL_VARCHAR=12, 255, 0, SQL_NULLABLE=1

 From psqlodbc.h

 #define MAX_VARCHAR_SIZE  255 /* default maximum size
of
* varchar fields (not
including null
 term) */

 So I guess the bottom line is that we are not the first to encounter
this
 problem .. Its just been covered up by assigning
 An arbitrary maximum size .. So I guess we will do the same and make
it
 configurable like the official postgres driver.

Of course, the downside here is that choosing a default will truncate
the data when the actual data is larger than the default chosen.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

SELECT 1  FROM test.dbo.a_003 

 

gets about 60,000 records per second

 

SELECT '1'  FROM test.dbo.a_003 

 

gets about 600 records per second.

 

The cause is that postgres describes the return column as unknown
length 65534 in the 2nd case.

 

Since the value is a constant, it seems rather odd to make the length
65534 characters.  Why not make it char(1) or some other appropriate and
less costly data type?  After all, it isn't going to grow during the
query.

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Gregory Stark [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 12:48 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit [EMAIL PROTECTED] writes:

  SELECT 1  FROM test.dbo.a_003

  gets about 60,000 records per second

  SELECT '1'  FROM test.dbo.a_003

  gets about 600 records per second.

  The cause is that postgres describes the return column as unknown
  length 65534 in the 2nd case.

 Wait, back up. How does this cause it to go slower?

The issue is this:

Postgres describes the column with a typmod of -1 (unknown) and a length
of 65534.

This means that any client application must reserve 65534 bytes of
spaces for every row of data (like a grid control for example), which
postgres should know (and report) that the maximum length of the column
is 1.

This is not a PSQL issue, it's an issue with other products relying on
the accuracy of the reported postgres metadata for a given SQL
statement.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 1:32 PM
 To: Dann Corbit
 Cc: Gregory Stark; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit [EMAIL PROTECTED] writes:
  The issue is this:
  Postgres describes the column with a typmod of -1 (unknown) and a
length
  of 65534.

 Oh, you're looking at typlen not typmod.  Please observe the comments
in
 pg_type.h:

   /*
* For a fixed-size type, typlen is the number of bytes we use
to
* represent a value of this type, e.g. 4 for an int4.  But for
 a
* variable-length type, typlen is negative.  We use -1 to
indicate
 a
* varlena type (one that has a length word), -2 to indicate a
* null-terminated C string.
*/
   int2typlen;

 You should be treating typlen as signed not unsigned, and not assuming
a
 fixed width for any negative value.

 Since the width refers to the server internal representation, and not
to
 what comes down the wire, I find it pretty strange for an application
to
 be using typlen for anything at all actually.

Thanks for the response.

Since libpq function PQfsize returns -2 for all constant character
strings in SQL statements ... What is the proper procedure to determine
the length of a constant character column after query execution but
before fetching the first row of data?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 1:46 PM
 To: Dann Corbit
 Subject: Re: [HACKERS] Selecting a constant question

 On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote:
  Our application is using the libPQ interface to access postgres.

  The query is select '123' from tablename  .. the table is not
  important.

  After executing the query, we interrogate the metadata of the result
set
  using the PQfsize, PQfmod and PQftype functions.

 Did you read the documentation of the PQfsize function?

 PQfsize returns the space allocated for this column in a database row,
 in other words the size of the server's internal representation of the
 data type. (Accordingly, it is not really very useful to clients.) A
 negative value indicates the data type is variable-length.

http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE
C-
 SELECT-INFO

  The size of the column is returned as 65534  (or -2 if you consider
this
  a signed short value)

 It's variable length, you can't say anything more.

So what you are saying is that the constant '1' is variable length, and
there is no way to find out the maximum length from the database.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Dann Corbit
 Sent: Monday, June 11, 2007 1:52 PM
 To: Martijn van Oosterhout
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

  -Original Message-
  From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED]
  Sent: Monday, June 11, 2007 1:46 PM
  To: Dann Corbit
  Subject: Re: [HACKERS] Selecting a constant question

  On Mon, Jun 11, 2007 at 01:29:37PM -0700, Dann Corbit wrote:
   Our application is using the libPQ interface to access postgres.

   The query is select '123' from tablename  .. the table is not
   important.

   After executing the query, we interrogate the metadata of the
result
 set
   using the PQfsize, PQfmod and PQftype functions.

  Did you read the documentation of the PQfsize function?

  PQfsize returns the space allocated for this column in a database
row,
  in other words the size of the server's internal representation of
the
  data type. (Accordingly, it is not really very useful to clients.) A
  negative value indicates the data type is variable-length.

http://www.postgresql.org/docs/8.2/interactive/libpq-exec.html#LIBPQ-EXE
 C-
  SELECT-INFO

   The size of the column is returned as 65534  (or -2 if you
consider
 this
   a signed short value)

  It's variable length, you can't say anything more.

 So what you are saying is that the constant '1' is variable length,
and
 there is no way to find out the maximum length from the database.

I have a PostgreSQL feature request:

Report the maximum size of a variable length string from the server.

Surely, we cannot be the only people who will need this information.  If
(for example) someone wants to bind to a grid, then the maximum size has
to be known in advance.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Gregory Stark [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 2:41 PM
 To: Dann Corbit
 Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org; Larry McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit [EMAIL PROTECTED] writes:

  Surely, we cannot be the only people who will need this information.
If
  (for example) someone wants to bind to a grid, then the maximum size
has
  to be known in advance.

 In fact psql needs it and implements this. It has to skim through the
 entire
 result set to calculate the column widths. It's quite a lot of work
but
 the
 server is in no better position to do it than psql.

Reading the data twice sounds a little painful.  What if there are 30
million rows?

 On the contrary the server is missing quite a bit of information of
how
 you
 intend to display the information. Do you need the number of bytes or
 characters? Are all the characters the same width in your display
system?
 What
 about currency symbols? Do you intend to reverse any quoting or just
 display
 backslashes?

Giving me the information about the data type will be enough.  As an
example, in this case we have varchar data.  If the server should be so
kind as to report varchar(1) for '1' or varchar(3) for '123' then I
would not have any difficulty binding the data to a grid.

 Even knowing how many characters and assuming fixed character widths
that
 wouldn't even be enough to set your grid control widths. Usually
people
 like
 numeric quantities decimal aligned and so two records 1.00 and
0.01
 will
 take much more width than two records with 1.00 and 2.00.

SQL*Server, Oracle, Ingres, DB/2 and other database systems somehow
manage to do it, so I guess it is not technically intractable.

I suspect that your own ODBC/JDBC and other drivers suffer from this
same effect.

Now, I do recognize that sometimes nobody is going to know how big
something is, including the server.  But with a query using a constant
it seems like it ought to be well defined to me.  Perhaps the
difficulties are escaping me because I am not familiar with the low
level guts of this problem.  But I suspect that lots of people besides
me would benefit if sizes of things were known when it is possible to
know them.

As I said before, I see that it cannot be known right now. So I am
putting it in as a feature request.

If you could be so kind as to point out the right spot to look in the
server code, I imagine we could fix it and check in the patch ourselves.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Got no response last time on setsockopt post, so I thought I would reiterate.

2007-06-11 Thread Dann Corbit

These two calls make our remote queries via libpq about twice as fast on
average.  It seems to me like it might be a nice addition to the core
product's libpq (I poked it into the spot where the Nagle algorithm is
turned off, but another place would be fine too).  Can anyone give me a
reason why it is a bad idea to add this in?  If it were made a parameter
with a default of 64K, that would be even better.  Then it could be
tuned to particular systems for maximum throughput.

 

  on = 65535;

  if (setsockopt(conn-sock, SOL_SOCKET, SO_RCVBUF,(char *) on,
sizeof(on))  0)

{

char  sebuf[256];

 

printfPQExpBuffer(conn-errorMessage,

libpq_gettext(could not set socket
SO_RCVBUF window size: %s\n),

 
SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));

return 0;

}



  on = 65535;

  if (setsockopt(conn-sock, SOL_SOCKET, SO_SNDBUF,(char *) on,
sizeof(on))  0)

{

char  sebuf[256];

 

printfPQExpBuffer(conn-errorMessage,

libpq_gettext(could not set socket
SO_SNDBUF window size: %s\n),

 
SOCK_STRERROR(SOCK_ERRNO, sebuf, sizeof(sebuf)));

return 0;

}

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:16 PM
 To: Dann Corbit
 Cc: Gregory Stark; Martijn van Oosterhout;
pgsql-hackers@postgresql.org;
 Larry McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit wrote:

   Dann Corbit [EMAIL PROTECTED] writes:

   In fact psql needs it and implements this. It has to skim through
the
   entire
   result set to calculate the column widths. It's quite a lot of
work
  but
   the
   server is in no better position to do it than psql.

  Reading the data twice sounds a little painful.  What if there are
30
  million rows?

 You get an out of memory error.

   On the contrary the server is missing quite a bit of information
of
   how you intend to display the information. Do you need the number
of
   bytes or characters? Are all the characters the same width in your
   display system?  What about currency symbols? Do you intend to
   reverse any quoting or just display backslashes?

  Giving me the information about the data type will be enough.  As an
  example, in this case we have varchar data.  If the server should be
so
  kind as to report varchar(1) for '1' or varchar(3) for '123' then I
  would not have any difficulty binding the data to a grid.

 Oh, you have the length information for each datum all right.  It's on
 the first four bytes of it.

Sure, but when I bind to a grid, I need to know a-priori how big the
biggest returned instance can be.  Reading the entire data set twice to
learn the size of a constant seems rather conceptually odd to me.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Martijn van Oosterhout [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:29 PM
 To: Dann Corbit
 Cc: Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; Larry
 McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 On Mon, Jun 11, 2007 at 03:18:33PM -0700, Dann Corbit wrote:
  Sure, but when I bind to a grid, I need to know a-priori how big the
  biggest returned instance can be.  Reading the entire data set twice
to
  learn the size of a constant seems rather conceptually odd to me.

 To be honest, the concept that a widget requires a constant that can't
 be changed later is also a bit odd.

Not when the data itself is a constant that cannot be changed.

 There are many times you won't know
 beforehand how big the data is, surely the framework should be smart
 enough to handle these cases?

If it were impossible to know the size of a string constant supplied in
the query, then I think I would agree with you here.  However, it seems
to me that the maximum possible size of such a known, constant-width
string is not hard to determine.

 Start the width at 100, if it turns out to be too small, make it
 bigger...

If that were a good idea, then why report data sizes at all?  Just let
it always be a surprise when it comes streaming down the pipe.

Honestly, I cannot fathom this answer.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Got no response last time on setsockopt post, so I thought I would reiterate.

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:41 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; Larry McGhaw
 Subject: Re: [HACKERS] Got no response last time on setsockopt post,
so I
 thought I would reiterate.

 Dann Corbit [EMAIL PROTECTED] writes:
  These two calls make our remote queries via libpq about twice as
fast on
  average.

 And, perhaps, cause even greater factors of degradation in other
 scenarios (not to mention the possibility of complete failure on some
 platforms).  You haven't provided nearly enough evidence that this is
 a safe change to make.

May I suggest:
http://www-didc.lbl.gov/TCP-tuning/setsockopt.html
http://www.ncsa.uiuc.edu/People/vwelch/net_perf/tcp_windows.html

We test against dozens of operating systems and we have never had a
problem (generally, we use our own tcp/ip network objects for
communication and we only recently figured out why PostgreSQL was
lagging so far behind and patched libPQ ourselves.)  Now, it will be
about 2 weeks before our full regressions have run against PostgreSQL on
all of our platforms, but we do adjust the TCP/IP window on all of our
clients and servers and have yet to find one that is unable to either
negotiate a decent size or ignore our request at worst.

However, I won't twist your arm.  I just wanted to be sure that those at
the PostgreSQL organization were aware of this simple trick.  Our
products run on:
Aix
BeOS
Hpux
Linux (everywhere, including mainframe zLinux)
MVS
SunOS
Solaris
OpenVMS Alpha
OpenVMS VAX
OpenVMS Itanium
Windows

And several others

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:44 PM
 To: Dann Corbit
 Cc: Tom Lane; Gregory Stark; Martijn van Oosterhout; pgsql-
 [EMAIL PROTECTED]; Larry McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit wrote:

  If the server bound the data as UNICODE, then it will tell me
  UNICODE(3).  I know how big this will be.

  In the worst case scenario it will fit in 3*4 = 12 bytes.

  If the server is built without UNICODE enabled, then it will
definitely
  fit in 3 bytes.

 Unless it's some other multibyte encoding.  And nowadays, the server
is
 always unicode enabled.  The stuff sent down the wire is unicode or
 not depending on a configuration parameter.

Even at that, we still know an absolute maximum of 12 bytes.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:50 PM
 To: Dann Corbit
 Cc: Martijn van Oosterhout; Alvaro Herrera; Gregory Stark; pgsql-
 [EMAIL PROTECTED]; Larry McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit [EMAIL PROTECTED] writes:
  To be honest, the concept that a widget requires a constant that
can't
  be changed later is also a bit odd.

  Not when the data itself is a constant that cannot be changed.

 Surely this case is not sufficiently important to justify designing
 your entire application (not to mention the client/server protocol)
 around it.  You're always going to have variable-width columns in
there
 somewhere.

Right.  But normally I get back a length for those variable length
columns, or I can collect it from the metadata of the database.

Surely, PostgreSQL can determine the size of a constant string.
Otherwise it would be impossible to know if it would be safe to insert a
constant string into a database column.

PostgreSQL has decided upon a data type, and gives me data bound in that
type.  It is only the length that it is unwilling to divulge.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Got no response last time on setsockopt post, so I thought I would reiterate.

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Greg Smith [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 4:09 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Got no response last time on setsockopt post,
so I
 thought I would reiterate.

 On Mon, 11 Jun 2007, Dann Corbit wrote:

  These two calls make our remote queries via libpq about twice as
fast on
  average.

 Can you comment a bit on what your remote queries are typically doing?
 You'll need to provide at least an idea what type of test case you're
 seeing the improvement on for others to try and replicate it.

We have literally thousands (maybe hundreds of thousands -- I'm not
totally sure exactly how many there are because I am in development and
not in testing) of queries, that take dozens of machines over a week to
run.

Our queries include inserts, updates, deletes, joins, views, um... You
name it.

Our product is database middleware and so we have to test against
anything that is a legal SQL query against every sort of database and
operating system combination (PostgreSQL is one of many environments
that we support).

If you have seen the NIST SQL verification suite, that is part of our
test suite.  We also found the PostgreSQL suite useful (though the
PostgreSQL specific things we only run against PostgreSQL).  We also
have our own collection of regression tests that we have gathered over
the past 20 years or so.

I can't be specific because we run every sort of query.  Most of our
hardware is fairly high end (generally 1GB Ethernet, but we do have some
machines that only have 100 MB net cards in them).

I guess that our usage is atypical for general business use but fairly
typical for those companies that produce middleware tool sets.  However,
many of our regressions came from customer feedback and so we do test
lots and lots of valid customer requirements.

I have a simple suggestion:
Put the setsockopt calls in (with the necessary fluff to make it robust)
and then perform the OSDB test.  I guess that unless the OSDB houses the
clients and the server on the same physical hardware you will see a very
large bonus for a very simple change.

 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore,
MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Got no response last time on setsockopt post, so I thought I would reiterate.

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 4:35 PM
 To: Dann Corbit
 Cc: Tom Lane; pgsql-hackers@postgresql.org; Larry McGhaw
 Subject: Re: [HACKERS] Got no response last time on setsockopt post,
so I
 thought I would reiterate.

 Dann Corbit wrote:
  However, I won't twist your arm.  I just wanted to be sure that
those at
  the PostgreSQL organization were aware of this simple trick.  Our
  products run on:
  Aix
  BeOS
  Hpux
  Linux (everywhere, including mainframe zLinux)
  MVS
  SunOS
  Solaris
  OpenVMS Alpha
  OpenVMS VAX
  OpenVMS Itanium
  Windows

  And several others

 We already set the SNDBUF on Windows for reasons documented in the
code.

The only place I see it is for Windows *only* in PQCOMM.C (to 32K).  Did
I miss it somewhere else?

 I think if you were to quantify the alleged improvement by platform it
 might allay suspicion.

I do not know if you will see the same results as we do.  We support
ancient and modern operating systems, on ancient and modern hardware (we
have OpenVMS 6.1 running Rdb as old as 4.2, for instance -- 1980's
technology).

The only way for you to see if your environments have the same sort of
benefits that we see is to test it yourselves.

The TCP/IP window size is such a well known optimization setting (in
fact the dominant one) that I am kind of surprised to be catching anyone
unawares.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Kris Jurka
 Sent: Monday, June 11, 2007 5:04 PM
 To: Larry McGhaw
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question
 
 
 
 On Mon, 11 Jun 2007, Larry McGhaw wrote:
 
  I think perhaps we have lost sight of the main issue:
 
  2) libpq can properly describe the maximum internal data size of any
  varchar column via Pqfmod
 
 SELECT cola || colb FROM tab;

Suggestion:
Return (column size of cola) + (column size of colb) in the maximum
length field.
 
  3) libpq can properly describe the maximum internal data size of any
  numeric constant in a SQL statement via Pqfsize
 
 SELECT 3::numeric;

Suggestion:
Return sizeof (numeric(1,0)) -- after all, it's a constant here.

In the words of the great poet Spike Lee:
'Always do the right thing.'


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Andrew Dunstan
 Sent: Monday, June 11, 2007 5:12 PM
 To: Larry McGhaw
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

 Larry McGhaw wrote:
  4) libpq **cannot** describe the maximum internal data size of a
char or
  varchar constant!
  Example:  select '123' from any table

  This is clearly a bug or serious oversight in libpq that should be
  addressed.

  The database *knows* this size of the char constant (obviously), and
  should report the size via a metadata call, as all other relational
  databases do.

 What is not clear to me is why it is so important for you to know the
 length of a piece of data you are supplying. If it is so vitally
 important, you could always cast it, e.g. select '123'::varchar(3)

We're a middleware company.  We are not in control of the queries that
are sent.  We can intercept and reformat them, and perhaps that is what
we will need to do for PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Got no response last time on setsockopt post, so I thought I would reiterate.

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 5:12 PM
 To: Dann Corbit
 Cc: pgsql-hackers@postgresql.org; Larry McGhaw
 Subject: Re: [HACKERS] Got no response last time on setsockopt post,
so I
 thought I would reiterate.

 Dann Corbit [EMAIL PROTECTED] writes:
  May I suggest:
  http://www-didc.lbl.gov/TCP-tuning/setsockopt.html
  http://www.ncsa.uiuc.edu/People/vwelch/net_perf/tcp_windows.html

 I poked around on those pages and almost immediately came across
 http://www.psc.edu/networking/projects/tcptune/
 which appears more up-to-date than the other pages, and it
specifically
 recommends *against* setting SO_SNDBUF or SO_RCVBUF on modern Linuxen.
 So that's one fairly large category where we probably do not want
this.

It can still be a good idea to set it:
http://datatag.web.cern.ch/datatag/howto/tcp.html
64K was just an example.  Like I said before, it should be configurable.

 You have not even made it clear whether you were increasing the sizes
in
 the server-to-client or client-to-server direction, and your
handwaving
 about the test conditions makes it even harder to know what you are
 measuring.  I would think for instance that local vs remote
connections
 make a big difference and might need different tuning.

The configuration is a negotiation between client and server.  You may
or may not get what you ask for.  I suggest that it is simple to
implement and worthwhile to test.  But it was only a suggestion.

 BTW, if we look at this issue we ought to also look at whether the
 send/recv quantum in libpq and the backend should be changed.  It's
been
 8K for about ten years now ...

I suspect that TCP/IP packetizing will moderate the affects of changes
on this parameter.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 5:32 PM
 To: Larry McGhaw
 Cc: Alvaro Herrera; Dann Corbit; Gregory Stark; Martijn van
Oosterhout;
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

 Larry McGhaw [EMAIL PROTECTED] writes:
  I think perhaps we have lost sight of the main issue:
  1) libpq can properly describe the maximum internal data size of any
  numeric or char column in a table via Pqfsize
  2) libpq can properly describe the maximum internal data size of any
  varchar column via Pqfmod
  3) libpq can properly describe the maximum internal data size of any
  numeric constant in a SQL statement via Pqfsize

 None of the above statements are actually true, at least not when you
 take off your blinders and note the existence of unconstrained-width
 numeric and text columns.

Unconstrained width columns are not what are being discussed here.  It
is constant expressions of known width.

  The database *knows* this size of the char constant (obviously),

 No, what it knows (and reports) is type information.  There are a
small
 number of datatypes where you can infer a maximum width from knowledge
 of the datatype.  There are many others where you can't set an upper
 bound from this knowledge --- at least not a usefully tight one.

If you do not know how large 1::numeric is, then how can you know
whether it is safe or not to insert it into a column of type
numeric(12,4)?

If you do not know how large 'Joe'::varchar is, then how can you know
whether it is safe to insert it into a column of type varchar(256)?

Clearly, neither of these operations will cause any problems and so the
size of a constant can be determined.

 Anyway, if we were to cast those constants to something other than
 unknown, it would be text, not varchar, and you'd still have the same
 issue.

Other database systems can manage this, and the programmers of those
database systems are not smarter than the programmers of the PostgreSQL
group.  Therefore I can conclude that if the PostgreSQL group decides it
is important, then they can figure out the size of a string or numeric
constant.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 3:35 PM
 To: Dann Corbit
 Cc: Gregory Stark; Martijn van Oosterhout;
pgsql-hackers@postgresql.org;
 Larry McGhaw
 Subject: Re: [HACKERS] Selecting a constant question

 Dann Corbit [EMAIL PROTECTED] writes:
  Giving me the information about the data type will be enough.  As an
  example, in this case we have varchar data.  If the server should be
so
  kind as to report varchar(1) for '1' or varchar(3) for '123' then I
  would not have any difficulty binding the data to a grid.

 This seems merest fantasy.  Reflect on multibyte character sets for a
 bit --- even if it's known that the column is varchar(3) there is no
 guarantee that the value will fit in 3 bytes.

If the server bound the data as UNICODE, then it will tell me
UNICODE(3).  I know how big this will be.

In the worst case scenario it will fit in 3*4 = 12 bytes.

If the server is built without UNICODE enabled, then it will definitely
fit in 3 bytes.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Selecting a constant question

2007-06-11 Thread Dann Corbit

 -Original Message-
 From: Hannu Krosing [mailto:[EMAIL PROTECTED]
 Sent: Monday, June 11, 2007 8:42 PM
 To: Dann Corbit
 Cc: Tom Lane; Gregory Stark; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Selecting a constant question

 Ühel kenal päeval, E, 2007-06-11 kell 13:38, kirjutas Dann Corbit:
   -Original Message-
   From: Tom Lane [mailto:[EMAIL PROTECTED]
   Sent: Monday, June 11, 2007 1:32 PM
   To: Dann Corbit
   Cc: Gregory Stark; pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] Selecting a constant question
 ...
   You should be treating typlen as signed not unsigned, and not assuming
  a
   fixed width for any negative value.

   Since the width refers to the server internal representation, and not
  to
   what comes down the wire, I find it pretty strange for an application
  to
   be using typlen for anything at all actually.

  Thanks for the response.

  Since libpq function PQfsize returns -2 for all constant character
  strings in SQL statements ... What is the proper procedure to determine
  the length of a constant character column after query execution but
  before fetching the first row of data?

 Why not just get the first row and determine the width from it before
 you actually use any of tha data ?

What if the second row is 1000x longer?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

1 2 3 4 5 >

1 - 100 of 433 matches

Mail list logo