Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 21:33 -0800, David Lang wrote: > > In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit > > sortKey as elsewhere suggested). The sorting key doesn't need to be a > > one-to-one mapping. > > that would violate your second contraint ( f(a)==f(b) iff (a==b) ) > > if you could drop that constraint (the cost of which would be extra 'real' > compares within a bucket) then a helper function per datatype could work > as you are talking. I think we're actually on the same page here; you're right that the constraint above ( f(a)==f(b) iff a==b ) can't be extended to data types with more than 32 bits of value space. But the constraint I listed was actually: if a==b then f(a)==f(b) Which doesn't imply 'if and only if'. It's a similar constraint to hashcodes; the same value will always have the same hash, but you're not guaranteed that the hashcodes for two distinct values will be unique. -- Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote: > > > Data types which could probably provide a useful function for f would be > > > int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). > > How exactly do you imagine doing this for text? > > I could see doing it for char(n)/varchar(n) where n<=4 in SQL_ASCII though. In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit sortKey as elsewhere suggested). The sorting key doesn't need to be a one-to-one mapping. -- Mark Lewis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index
On Thu, 2006-02-16 at 12:15 -0500, Tom Lane wrote: > Once or twice we've kicked around the idea of having some > datatype-specific sorting code paths alongside the general-purpose one, > but I can't honestly see this as being workable from a code maintenance > standpoint. > > regards, tom lane It seems that instead of maintaining a different sorting code path for each data type, you could get away with one generic path and one (hopefully faster) path if you allowed data types to optionally support a 'sortKey' interface by providing a function f which maps inputs to 32- bit int outputs, such that the following two properties hold: f(a)>=f(b) iff a>=b if a==b then f(a)==f(b) So if a data type supports the sortKey interface you could perform the sort on f(value) and only refer back to the actual element comparison functions when two sortKeys have the same value. Data types which could probably provide a useful function for f would be int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII). Depending on the overhead, you might not even need to maintain 2 independent search code paths, since you could always use f(x)=0 as the default sortKey function which would degenerate to the exact same sort behavior in use today. -- Mark Lewis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] Releasing memory during External sorting?
operations != passes. If you were clever, you could probably write a modified bubble-sort algorithm that only made 2 passes. A pass is a disk scan, operations are then performed (hopefully in memory) on what you read from the disk. So there's no theoretical log N lower-bound on the number of disk passes. Not that I have anything else useful to add to this discussion, just a tidbit I remembered from my CS classes back in college :) -- Mark On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote: > Ron Peacetree <[EMAIL PROTECTED]> writes: > > 2= No optimal external sorting algorithm should use more than 2 passes. > > 3= Optimal external sorting algorithms should use 1 pass if at all possible. > > A comparison-based sort must use at least N log N operations, so it > would appear to me that if you haven't got approximately log N passes > then your algorithm doesn't work. > > regards, tom lane > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 6: explain analyze is your friend