Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > On Thu, Dec 29, 2005 at 04:33:32PM -, Andrew - Supernews wrote: >> Does it matter? How would the same operator specify different orderings >> in different operator classes, > Well, we currently don't forbid it and indeed encourage it (by > encouraging reverse operator classes) as the only way to handle the > ORDER a, b DESC case right now. > I don't think I can find any other examples right now. I don't think > I'd have a problem with forbidding it at some future date. Right, the reverse-sort opclass is the only practical example that anyone's pointed out ... for btree. For GiST it would be a serious error to try to restrict operators to appear in at most one opclass. Therefore, we're not going to be forbidding it, and the code has to behave in a sane fashion if someone does it. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Thu, Dec 29, 2005 at 04:33:32PM -, Andrew - Supernews wrote: > On 2005-12-29, Tom Lane <[EMAIL PROTECTED]> wrote: > > Well, no, that's not the problem: the problem is that you should be able > > to specify ORDER BY any sort ordering that the system can deal with, and > > the USING syntax is in fact too impoverished to do that. What if the > > mentioned operator is in more than one operator class? I believe that > > ATM the code makes a random choice of which opclass' sort function to > > use, which pretty much sucks. > > Does it matter? How would the same operator specify different orderings > in different operator classes, given that it must be a strict weak ordering > for sorting to even work, and such an ordering is completely determined by > either one of its greater-than/less-than operators? Well, we currently don't forbid it and indeed encourage it (by encouraging reverse operator classes) as the only way to handle the ORDER a, b DESC case right now. I don't think I can find any other examples right now. I don't think I'd have a problem with forbidding it at some future date. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpWfmh0w16LM.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Sun, Jan 01, 2006 at 07:48:56AM -, Andrew - Supernews wrote: > Doesn't this result in incorrect output in multi-column sorts? > > i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them > with 'Foo' first, then a multicolumn sort of the following data: > > ('Foo',1) > ('foo',2) > ('Foo',3) > > would produce the wrong output, no? Yeah, I was thinking about that but I'm not sure how big a problem it is. We could make this work by having the sorting routines (and by extension the lookups for indexes) by doing two passes: one with the comparison routines and use the actual sorting routines only on tie-breaks. A kind of big mark against the idea. Although this is exactly the kind of work required for string sorting in many locales... Whatever we do, the result of ORDER BY has to match index order. Does this mean scankeys need to have two functions, in case of tie-breaking multi-column keys. I dunno but it needs thinking about. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpoA71wMSYFC.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On 2005-12-29, Tom Lane <[EMAIL PROTECTED]> wrote: > Well, no, that's not the problem: the problem is that you should be able > to specify ORDER BY any sort ordering that the system can deal with, and > the USING syntax is in fact too impoverished to do that. What if the > mentioned operator is in more than one operator class? I believe that > ATM the code makes a random choice of which opclass' sort function to > use, which pretty much sucks. Does it matter? How would the same operator specify different orderings in different operator classes, given that it must be a strict weak ordering for sorting to even work, and such an ordering is completely determined by either one of its greater-than/less-than operators? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On 2005-12-31, Martijn van Oosterhout wrote: > You do bring up the possibility of secondary sort functions. Functions > which are not involved in testing for equality, but provide addition > sorting so that even in a case-insensetive sort, the different > variations in case appear together. "All variations are equal, but some > are more equal than others" type setup. Doesn't this result in incorrect output in multi-column sorts? i.e. if 'Foo' = 'foo', but for sorting purposes you always sort them with 'Foo' first, then a multicolumn sort of the following data: ('Foo',1) ('foo',2) ('Foo',3) would produce the wrong output, no? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Sat, Dec 31, 2005 at 02:54:18PM -0500, Tom Lane wrote: > The example of case-insensitive sorting suggests that we need to assume > that sort comparison functions can make finer-grained comparisons than > the associated "equals" operator does. The current infrastructure > forces these to be exactly the same, but as long as we're busy > reinventing stuff, we could have two comparison functions associated > with a btree opclass: one that mimics the operators' behavior and one > that makes finer-grained comparisons and defines the actual sort order. Indeed, that's exactly the thought I had this afternoon, distiguish a "collation" and a "comparison" function. It's certainly a lot easier to implement than anything else I could think of Have a great New Year everyone, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpLhmlXjIqDY.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote: >> Two things are either the same or they aren't, and that can't change >> based on context. > So someone who wants a case-insensetive search actually doesn't want > "Foo" to equal "foo"? That nice simple worldview falls down in other areas as well. An example is zero and minus zero in IEEE math: they are equal for some purposes but not others. I think you really have to say that equality is defined with respect to a particular datatype and a particular set of operators. The example of case-insensitive sorting suggests that we need to assume that sort comparison functions can make finer-grained comparisons than the associated "equals" operator does. The current infrastructure forces these to be exactly the same, but as long as we're busy reinventing stuff, we could have two comparison functions associated with a btree opclass: one that mimics the operators' behavior and one that makes finer-grained comparisons and defines the actual sort order. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Sat, Dec 31, 2005 at 12:58:19AM -0500, Greg Stark wrote: > I think this is a mistake -- the same mistake that got us into trouble with > Turkish. > > Hashing depends on the concept of equality which is integral to the type. Two > things are either the same or they aren't, and that can't change based on > context. So someone who wants a case-insensetive search actually doesn't want "Foo" to equal "foo"? If you're arguing that that should be a different type, well, that's a possibility. But does that mean someone who wants an accent insensetive match also needs a new type? And a phonebook match, where "Mc" and "Mac" are the same? It was my understanding that the problem with Turkish/Hungarian was the we only allow one collation for strings over the whole database. The point is that in the future you will be able to select this on a per column/index/query basis, so we don't need to stick to such a restriction if the user explicitly asks to ignore it. On a more practical level, a Hash Join needs to produce the same results as a Merge Join, so if (a = b) then (hash(a) = hash(b)). So if the user types (a = b COLLATE ignorecase) then the hash function needs to change to match. > Specifically in the case of strings, two strings should only be considered > "equal" if they consist of the exact same series of characters. (That is, they > could be encoded differently but they have to encode the same actual > characters.) That they happen to sort equally compared to all other strings > doesn't mean that they're equal. Sure, for straight strings (COLLATE POSIX), that's absolutly a requirement. But people also have other requirements, like treating strings case-insensetively. I don't think we should restrict ourselves to not being able to support their wishes. You do bring up the possibility of secondary sort functions. Functions which are not involved in testing for equality, but provide addition sorting so that even in a case-insensetive sort, the different variations in case appear together. "All variations are equal, but some are more equal than others" type setup. Thanks for the feedback, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp1xpk5wgBqf.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > Yet hashing is also a property of the collation, not the type. The same > string in different locales would hash differently. I think this is a mistake -- the same mistake that got us into trouble with Turkish. Hashing depends on the concept of equality which is integral to the type. Two things are either the same or they aren't, and that can't change based on context. Specifically in the case of strings, two strings should only be considered "equal" if they consist of the exact same series of characters. (That is, they could be encoded differently but they have to encode the same actual characters.) That they happen to sort equally compared to all other strings doesn't mean that they're equal. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Fri, Dec 30, 2005 at 10:18:48AM -0500, Tom Lane wrote: > I really need to study your mail from the other day, but unfortunately > other pressures will probably keep me from getting to it today :-(. > One comment though --- it's not really sane to include ASC/DESC in there > is it? I thought the spec wanted ORDER BY foo COLLATE bar [ASC/DESC] > ... or if not, users certainly will. If every single collation has to > be created in a matched ASC/DESC pair, you've done it wrong. Well, that's an interesting question. From a user point of view, forward and backwards isn't a real big deal. Internally it is though. Two places really care: - Pathkeys - Index keys For pathkeys in particular, I really don't like the propect of going through all the planning code changing a single pathkey (which is currently just the oid of the sorting operator) to a collation plus direction pair. Index keys are similar, though the code to change it not as much. We'd still need to invent another column to store the direction. For users ofcourse, we want them to look similar. Currently forward and backward collations have different names and different OIDs, though it's quite reasonable to give them the same name and require the user to say DESC to access the reverse collation (still with different OIDs). However, the real reason I think they should both be declared is that as actual objects, they don't resemble eachother very much. For example, an int4 forward collation can be merge joined with an int2 forward collation but not with an int2 backward collation. When you start considering relationships between collations, forwards and backwards behave very differently. Final food for thought: does the "xid" type have a collation. There's no b-tree operator class but it does have the concept of equality. It has a valid hashing function. Is this a collation? If so it has no order (so ASC/DESC don't apply). Yet hashing is also a property of the collation, not the type. The same string in different locales would hash differently. Where this heads is that a user creates a type, then a collation by specifying an order and a hash function. The system then creates the operator classes in the background. This is a radical departure from where we are now and not something I'm seriously proposing. Don't worry about how long it takes to think about it. 'tis the season to be jolly, no? I myself will be offline for most of next week anyway and the next few days are to be enjoyed. Seasons greetings, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgp1AXRb2Cbax.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote: >> What I'd really like is to deprecate the "USING operator" syntax in >> favor of a "USING operatorclassname" syntax. Actually, "USING opclass >> [ASC/DESC]" would get the job done, since given an opclass you can >> certainly run the sort function either normal or reverse. > Thought of something this morning: this seems OK at first glance but I > don't think it's workable. The example being locale dependant sorting. > I really don't think we want to create a new operator class for each > possible way you can sort strings. Well, you would need to add a COLLATE layer on top of this in just the same way as you'd need a COLLATE layer now if you want locale-dependent sorting. I didn't claim it handled that; just pointing out that COLLATE doesn't handle this, either. > Collations (currently anyway) are really just an operator class + > [ASC/DESC] + optional locale rolled into a single identifier. I really need to study your mail from the other day, but unfortunately other pressures will probably keep me from getting to it today :-(. One comment though --- it's not really sane to include ASC/DESC in there is it? I thought the spec wanted ORDER BY foo COLLATE bar [ASC/DESC] ... or if not, users certainly will. If every single collation has to be created in a matched ASC/DESC pair, you've done it wrong. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote: > What I'd really like is to deprecate the "USING operator" syntax in > favor of a "USING operatorclassname" syntax. Actually, "USING opclass > [ASC/DESC]" would get the job done, since given an opclass you can > certainly run the sort function either normal or reverse. Thought of something this morning: this seems OK at first glance but I don't think it's workable. The example being locale dependant sorting. I really don't think we want to create a new operator class for each possible way you can sort strings. Collations (currently anyway) are really just an operator class + [ASC/DESC] + optional locale rolled into a single identifier. The idea being that when you use '<', the collation on that node decides unambiguously which version of '<' you mean, rather than looking up the operator and trying to guess which operator class you meant. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpugX7Tq8Z5h.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Thu, Dec 29, 2005 at 10:49:23AM -0500, Tom Lane wrote: > Well, no, that's not the problem: the problem is that you should be able > to specify ORDER BY any sort ordering that the system can deal with, and > the USING syntax is in fact too impoverished to do that. What if the > mentioned operator is in more than one operator class? I believe that > ATM the code makes a random choice of which opclass' sort function to > use, which pretty much sucks. Ah, that problem. Yeah, at this stage we can't really do much about that. All I've done at this stage is made it so that if the operator isn't a member of any operator class it displays: ERROR: Couldn't find order function associated with operator XXX HINT: Create a B-tree operator class with this operator in it Eventually the COLLATE option should make the choice unambiguous. But I'm not there yet. One benefit right now is that it permits some code reorganisation to remove the myFunctionCall2 hack, by passing FunctionCallInfo rather than FmgrInfo. > We could keep the "USING operator" syntax but insist that it's only > allowed if there's exactly one possible opclass mapping. Can't do that yet, that breaks reverse operator classes. Ofcourse, we may be able to do it eventually if there's another way to do the same thing. The main problem being that ORDER BY x ASC is actually made equivalent to ORDER BY x USING <, so the code still has to work for the USING case to not break the normal case. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpw2Y8u7jY9k.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > Hmm. By feature I assume you mean "ORDER BY ... USING" (which no-one > could find an example of) and not "requiring the operator to be part of > an opclass". > In fact, I don't think we ever need to remove the syntax, just as long > as the operator is part of an operator class, it'll be fine. Well, no, that's not the problem: the problem is that you should be able to specify ORDER BY any sort ordering that the system can deal with, and the USING syntax is in fact too impoverished to do that. What if the mentioned operator is in more than one operator class? I believe that ATM the code makes a random choice of which opclass' sort function to use, which pretty much sucks. I haven't had time yet to digest the material you posted yesterday about COLLATE. Maybe there's a solution in there, but I think it could only happen if we assume that every potential sort operator appears in only one opclass. Which seems like a pretty restrictive assumption, even granted that COLLATE will start to carry some of the load of picking different sorting options. What I'd really like is to deprecate the "USING operator" syntax in favor of a "USING operatorclassname" syntax. Actually, "USING opclass [ASC/DESC]" would get the job done, since given an opclass you can certainly run the sort function either normal or reverse. We could keep the "USING operator" syntax but insist that it's only allowed if there's exactly one possible opclass mapping. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
On Wed, Dec 28, 2005 at 07:38:36PM -0500, Tom Lane wrote: > Martijn van Oosterhout writes: > > The issue is whether anything you want to ORDER BY needs to be > > described by an B-tree operator class, and hence have a real sort > > function. > > I think it's reasonable to remove that feature, *after* we provide > a workable substitute. So, "no" to both questions ... Hmm. By feature I assume you mean "ORDER BY ... USING" (which no-one could find an example of) and not "requiring the operator to be part of an opclass". The only people affected would be people who defined a less-than operator but no operator class, which you said yourself would probably just be encouraging programmer lazyness. I wasn't suggesting removing the ORDER BY ... USING syntax, just these two options from the sorting routines. In fact, I don't think we ever need to remove the syntax, just as long as the operator is part of an operator class, it'll be fine. Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. pgpiZF30UI8xD.pgp Description: PGP signature
Re: [HACKERS] Removing SORTFUNC_LT/REVLT
Martijn van Oosterhout writes: > The issue is whether anything you want to ORDER BY needs to be > described by an B-tree operator class, and hence have a real sort > function. > 1. Do people have any problems with this? > 2. Would a patch for this be accepted seperate from the whole collation > stuff? I think it's reasonable to remove that feature, *after* we provide a workable substitute. So, "no" to both questions ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster