Re: [HACKERS] Some array semantics issues
Greg Stark [EMAIL PROTECTED] writes: [ this is back up-thread a bit ] And changing that would make it harder to test just the contents of the array without having to match bounds as well. That is, You couldn't say list = '{1,2}' to test if the array contained 1,2. You would have to, well, I'm not even sure how you would test it actually. Maybe something kludgy like '{}'::int[] || list = '{1,2}' ? Given the just-committed changes to avoid having array_push/array_cat generate non-spec lower bounds unnecessarily, do you still think it's important to have a variant of array comparison that ignores lower bounds? ISTM that ignoring lower bounds is definitely something that violates the principle of least surprise. There was an ease-of-use argument for it before, but now that we changed the other thing I think we don't need such a kluge. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Some array semantics issues
Tom Lane wrote: Given the just-committed changes to avoid having array_push/array_cat generate non-spec lower bounds unnecessarily, do you still think it's important to have a variant of array comparison that ignores lower bounds? ISTM that ignoring lower bounds is definitely something that violates the principle of least surprise. There was an ease-of-use argument for it before, but now that we changed the other thing I think we don't need such a kluge. I agree. At this point, having an array with other than 1 as a lower bound takes a very conscious decision. I'd think that if you cared that much about the lower bound, you'd not want to ignore it when it comes to comparison. Joe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Some array semantics issues
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I don't have a lot of use for arguments that go we should remove any functionality that's not in the spec ... ISTM that variable lower bounds are clearly useful for some applications, and even if they had bugs in earlier releases that's not an argument for removing them. Normally I don't either. But it's not just functionality that's not in the spec. It's functionality that creates behaviour the spec specifies otherwise. AFAICS the only cases that give rise to arrays with lower bounds other than one are: * direct entry of a literal with explicit lower bound; * assignment to a subscript or slice below 1; * array_prepend (and the N/N+1-dimension case of array_cat). I don't think it's not in the spec is a reason for rejecting #1 or #2. But I agree that there is a reasonable case for modifying array_prepend and array_cat so that they won't generate non-spec lower bounds where none existed before. How about changing them so that the lower bound of the right-hand array is preserved, rather than decreased by one? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Some array semantics issues
Tom Lane wrote: AFAICS the only cases that give rise to arrays with lower bounds other than one are: * direct entry of a literal with explicit lower bound; * assignment to a subscript or slice below 1; * array_prepend (and the N/N+1-dimension case of array_cat). I don't think it's not in the spec is a reason for rejecting #1 or #2. But I agree that there is a reasonable case for modifying array_prepend and array_cat so that they won't generate non-spec lower bounds where none existed before. How about changing them so that the lower bound of the right-hand array is preserved, rather than decreased by one? That seems reasonable. I'll do it if you'd like... Joe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Some array semantics issues
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: How about changing them so that the lower bound of the right-hand array is preserved, rather than decreased by one? That seems reasonable. I'll do it if you'd like... I'll take care of it --- I'm still hacking in that general area anyway. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Some array semantics issues
Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: ... My hope was that eventually anyarray I/O functions could eliminate the need to create an array type for every data type you wanted to use as an array element. Interesting thought, but then how do you declare the type of an array column, or the type of a function argument that's not supposed to range over every array type? If we can't use an OID to identify a data type completely, we're going to have lots of problems. You only really need two pieces of information to uniquely identify an array data type -- the OID of the (leaf-node) scalar elements, and the fact that what you have is an array. Even if it is a nested structure of arrays, by recursing (max 5 times), you can eventually find the scalar elements. Last year I played around with this and had it partially working, but then got too busy to pursue it further. Joe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Some array semantics issues
Joe Conway [EMAIL PROTECTED] writes: Tom Lane wrote: If we can't use an OID to identify a data type completely, we're going to have lots of problems. You only really need two pieces of information to uniquely identify an array data type -- the OID of the (leaf-node) scalar elements, and the fact that what you have is an array. Yes, but replacing one piece of data (type OID) with two (type OID plus array flag bit) is going to be notationally horrible. What's more it will force knowledge about the existence of array types into a lot of places that don't have to care right now. For something that is ultimately just a flange on the side of SQL, that doesn't seem like a good tradeoff. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Some array semantics issues
Tom Lane [EMAIL PROTECTED] writes: regression=# select '[0:2]={1,2,3}'::int[] = '{1,2,3}'::int[]; ?column? -- t (1 row) regression=# select '{1,2,3,4}'::int[] = '{{1,2},{3,4}}'::int[]; ?column? -- t (1 row) This seems pretty bogus as well. The second case seems utterly bogus. But the first case seems maybe justifiable. maybe. In the past Postgres treated the array bounds as so insignificant they weren't even worth preserving across a dump/restore. And changing that would make it harder to test just the contents of the array without having to match bounds as well. That is, You couldn't say list = '{1,2}' to test if the array contained 1,2. You would have to, well, I'm not even sure how you would test it actually. Maybe something kludgy like '{}'::int[] || list = '{1,2}' ? I'm not entirely against the idea of making array bounds significant but I guess we would need some convenient way of taking them out of the picture too. Perhaps another equality operator. -- greg ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Some array semantics issues
Greg Stark [EMAIL PROTECTED] writes: And changing that would make it harder to test just the contents of the array without having to match bounds as well. Fair point, but currently it's impossible to make a comparison that *does* consider the bounds, which one would think would be the ordinary meaning of equality. I'm not entirely against the idea of making array bounds significant but I guess we would need some convenient way of taking them out of the picture too. Perhaps another equality operator. I could go for a separate operator that has the current behavior (might as well ignore number of dimensions too, if we're going to ignore bounds). Any thoughts about the operator name? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Some array semantics issues
Tom Lane [EMAIL PROTECTED] writes: I could go for a separate operator that has the current behavior (might as well ignore number of dimensions too, if we're going to ignore bounds). Any thoughts about the operator name? Well to me these are two different cases. At least the way I see it {1,2} is a list of two numbers, and {{1,2,},{3,4}} is a list of two lists. They aren't the same and they don't even contain the same thing. It occurs to me that it would also make sense to have an operator that considered the arrays in an order-insensitive comparison. It all depends on what you're using the arrays to represent. If you're implementing something where each slot of the array corresponds to some specific meaning then you need the array bounds included. If you're representing stacks where the array bounds march up as they're used then you don't really want to include the array bounds in your comparison. If you're representing a denormalized one-to-many relationship (being aware of all the associated pros and cons of denormalization of course) then you really don't care about the order at all. Personally I can't really think of any cases where the shape of the array doesn't matter though. -- greg ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Some array semantics issues
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I could go for a separate operator that has the current behavior (might as well ignore number of dimensions too, if we're going to ignore bounds). Any thoughts about the operator name? Well to me these are two different cases. At least the way I see it {1,2} is a list of two numbers, and {{1,2,},{3,4}} is a list of two lists. They aren't the same and they don't even contain the same thing. Well, in that case what do you think about {{1,2},{3,4},{5,6},{7,8}} vs {{1,2,3,4},{5,6,7,8}} These have the same physical contents and the same number of dimensions, so unless you want to consider them equal, you have to consider the dimension values. I think what we may be dancing around here is that there are some cases where it makes sense to ignore the lower bounds, as opposed to the axis lengths. I'm not convinced that there are any cases where it makes sense to compare the number of dimensions without comparing the axis lengths --- but I can see the argument that lower bounds might be uninteresting, particularly seeing that array_push and and array_cat do some not-necessarily-always-right things with them. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Some array semantics issues
Tom Lane [EMAIL PROTECTED] writes: Well, in that case what do you think about {{1,2},{3,4},{5,6},{7,8}} vs {{1,2,3,4},{5,6,7,8}} In the first case the first element is {1,2} and in the second case the first element is {1,2,3,4} so from my point of view there's no way these are the same. None of the three use cases I conjured would want them considered equal, which isn't to say there isn't some data structure somewhere out there which would, but I haven't thought of it. -- greg ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Some array semantics issues
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Well, in that case what do you think about {{1,2},{3,4},{5,6},{7,8}} vs {{1,2,3,4},{5,6,7,8}} In the first case the first element is {1,2} and in the second case the first element is {1,2,3,4} so from my point of view there's no way these are the same. Well, then I think we're converging on agreement that array comparison should always take into account the number of dimensions and the axis lengths. What seems still in question is whether to compare or ignore the axis lower bounds. I'd argue that ordinary equality should include the lower bounds, but I'm willing to provide a separate operator (or whole btree opclass if people want it) that ignores the lower bounds. We just need a name. Maybe ~=, ~, etc? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Some array semantics issues
On Wed, Nov 16, 2005 at 03:03:53PM -0500, Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I could go for a separate operator that has the current behavior (might as well ignore number of dimensions too, if we're going to ignore bounds). Any thoughts about the operator name? Well to me these are two different cases. At least the way I see it {1,2} is a list of two numbers, and {{1,2,},{3,4}} is a list of two lists. They aren't the same and they don't even contain the same thing. Right. It occurs to me that it would also make sense to have an operator that considered the arrays in an order-insensitive comparison. That sounds more like the SQL:2003 MULTISET, which is essentially unordered. Any plans for these? Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Some array semantics issues
David Fetter [EMAIL PROTECTED] writes: On Wed, Nov 16, 2005 at 03:03:53PM -0500, Greg Stark wrote: It occurs to me that it would also make sense to have an operator that considered the arrays in an order-insensitive comparison. That sounds more like the SQL:2003 MULTISET, which is essentially unordered. Any plans for these? Seems to me it would be really expensive to try to make such a comparison directly with the present array representation. The only sensible way to do it would be to sort the elements of the two arrays (using the comparison operator of the element data type) and then compare the results. So you don't actually need a variant equality operator, you just need a generic array_sort() function, and then go array_sort(x) = array_sort(y). Such a function might have other uses besides this, too. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Some array semantics issues
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Well, in that case what do you think about {{1,2},{3,4},{5,6},{7,8}} vs {{1,2,3,4},{5,6,7,8}} In the first case the first element is {1,2} and in the second case the first element is {1,2,3,4} so from my point of view there's no way these are the same. Well, then I think we're converging on agreement that array comparison should always take into account the number of dimensions and the axis lengths. What seems still in question is whether to compare or ignore the axis lower bounds. I'd argue that ordinary equality should include the lower bounds, but I'm willing to provide a separate operator (or whole btree opclass if people want it) that ignores the lower bounds. We just need a name. Maybe ~=, ~, etc? A couple of thoughts based on the last time I read SQL2003 WRT arrays. First, the spec only allows arrays to have a lower bound of 1. That requirement simplifies a whole lot of things. I don't think that many people actually depend on other than 1 as a lower bound (or at least if they do, they weren't dumping and reloading those databases prior to 8.0) -- maybe given other possibly non-backward compatible changes for NULLs, we should also change this? Second, the spec does not really directly allow for multidimensional arrays. What it does allow is nesting of arrays. So as Greg states, {1,2} is clearly a different array than {1,2,3,4}. I had been thinking that when (if?) the array literal parser and related infrastructure is rewritten, it should be done so that arrays-as-array-elements are processed similar to any scalar element (and perhaps tuples as array elements as well). My hope was that eventually anyarray I/O functions could eliminate the need to create an array type for every data type you wanted to use as an array element. Joe ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Some array semantics issues
Joe Conway [EMAIL PROTECTED] writes: First, the spec only allows arrays to have a lower bound of 1. That requirement simplifies a whole lot of things. I don't think that many people actually depend on other than 1 as a lower bound (or at least if they do, they weren't dumping and reloading those databases prior to 8.0) -- maybe given other possibly non-backward compatible changes for NULLs, we should also change this? I don't have a lot of use for arguments that go we should remove any functionality that's not in the spec ... ISTM that variable lower bounds are clearly useful for some applications, and even if they had bugs in earlier releases that's not an argument for removing them. ... My hope was that eventually anyarray I/O functions could eliminate the need to create an array type for every data type you wanted to use as an array element. Interesting thought, but then how do you declare the type of an array column, or the type of a function argument that's not supposed to range over every array type? If we can't use an OID to identify a data type completely, we're going to have lots of problems. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Some array semantics issues
Tom Lane [EMAIL PROTECTED] writes: Joe Conway [EMAIL PROTECTED] writes: First, the spec only allows arrays to have a lower bound of 1. That requirement simplifies a whole lot of things. I don't think that many people actually depend on other than 1 as a lower bound (or at least if they do, they weren't dumping and reloading those databases prior to 8.0) -- maybe given other possibly non-backward compatible changes for NULLs, we should also change this? I don't have a lot of use for arguments that go we should remove any functionality that's not in the spec ... ISTM that variable lower bounds are clearly useful for some applications, and even if they had bugs in earlier releases that's not an argument for removing them. Normally I don't either. But it's not just functionality that's not in the spec. It's functionality that creates behaviour the spec specifies otherwise. That is, if you have an array [1,2] the spec says you can get 1 by referring to arr[1]. On Postgres you have to take more care. There could easily be code out there that won't work on Postgres because of this difference. The main reason for having non-zero lower bounds in the first place was originally that NULLs weren't allowed in arrays. Otherwise you run into problems when you try to set arr[5] = 1 when there isn't an arr[1]..arr[4]. But if we have NULLs in arrays then we could easily have all arrays have lower bounds of 1. We don't even have to store the leading NULL elements. I think having all arrays start at 1 would actually much simplify the semantics and avoid a lot of strange corner cases and surprising behaviour that will follow from having non-1 based arrays. -- greg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Some array semantics issues
Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Joe Conway [EMAIL PROTECTED] writes: First, the spec only allows arrays to have a lower bound of 1. That requirement simplifies a whole lot of things. I don't think that many people actually depend on other than 1 as a lower bound (or at least if they do, they weren't dumping and reloading those databases prior to 8.0) -- maybe given other possibly non-backward compatible changes for NULLs, we should also change this? I don't have a lot of use for arguments that go we should remove any functionality that's not in the spec ... ISTM that variable lower bounds are clearly useful for some applications, and even if they had bugs in earlier releases that's not an argument for removing them. Normally I don't either. But it's not just functionality that's not in the spec. It's functionality that creates behaviour the spec specifies otherwise. This is an important point. SQL2003 doesn't leave this as undefined: 4.10.2 Arrays An array is a collection A in which each element is associated with exactly one ordinal position in A. If n is the cardinality of A, then the ordinal position p of an element is an integer in the range 1 (one) = p = n. If EDT is the element type of A, then A can thus be considered as a function of the integers in the range 1 (one) to n into EDT. Joe ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq