subject:"Re\: \[HACKERS\] Some array semantics issues"

Re: [HACKERS] Some array semantics issues

2005-11-18 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 [ this is back up-thread a bit ]
 And changing that would make it harder to test just the contents of the array
 without having to match bounds as well. That is, You couldn't say list =
 '{1,2}' to test if the array contained 1,2. You would have to, well, I'm not
 even sure how you would test it actually. Maybe something kludgy like
 '{}'::int[] || list = '{1,2}' ?

Given the just-committed changes to avoid having array_push/array_cat
generate non-spec lower bounds unnecessarily, do you still think it's
important to have a variant of array comparison that ignores lower
bounds?

ISTM that ignoring lower bounds is definitely something that violates
the principle of least surprise.  There was an ease-of-use argument
for it before, but now that we changed the other thing I think we don't
need such a kluge.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Some array semantics issues

2005-11-18 Thread Joe Conway


Tom Lane wrote:

Given the just-committed changes to avoid having array_push/array_cat
generate non-spec lower bounds unnecessarily, do you still think it's
important to have a variant of array comparison that ignores lower
bounds?

ISTM that ignoring lower bounds is definitely something that violates
the principle of least surprise.  There was an ease-of-use argument
for it before, but now that we changed the other thing I think we don't
need such a kluge.


I agree. At this point, having an array with other than 1 as a lower 
bound takes a very conscious decision. I'd think that if you cared that 
much about the lower bound, you'd not want to ignore it when it comes to 
comparison.


Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Some array semantics issues

2005-11-17 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I don't have a lot of use for arguments that go we should remove any
 functionality that's not in the spec ... ISTM that variable lower
 bounds are clearly useful for some applications, and even if they had
 bugs in earlier releases that's not an argument for removing them.

 Normally I don't either. But it's not just functionality that's not in the
 spec. It's functionality that creates behaviour the spec specifies otherwise.

AFAICS the only cases that give rise to arrays with lower bounds other
than one are:
* direct entry of a literal with explicit lower bound;
* assignment to a subscript or slice below 1;
* array_prepend (and the N/N+1-dimension case of array_cat).

I don't think it's not in the spec is a reason for rejecting #1 or #2.
But I agree that there is a reasonable case for modifying array_prepend
and array_cat so that they won't generate non-spec lower bounds where
none existed before.

How about changing them so that the lower bound of the right-hand array
is preserved, rather than decreased by one?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Some array semantics issues

2005-11-17 Thread Joe Conway


Tom Lane wrote:

AFAICS the only cases that give rise to arrays with lower bounds other
than one are:
* direct entry of a literal with explicit lower bound;
* assignment to a subscript or slice below 1;
* array_prepend (and the N/N+1-dimension case of array_cat).

I don't think it's not in the spec is a reason for rejecting #1 or #2.
But I agree that there is a reasonable case for modifying array_prepend
and array_cat so that they won't generate non-spec lower bounds where
none existed before.

How about changing them so that the lower bound of the right-hand array
is preserved, rather than decreased by one?



That seems reasonable. I'll do it if you'd like...

Joe



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Some array semantics issues

2005-11-17 Thread Tom Lane

Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 How about changing them so that the lower bound of the right-hand array
 is preserved, rather than decreased by one?

 That seems reasonable. I'll do it if you'd like...

I'll take care of it --- I'm still hacking in that general area anyway.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Some array semantics issues

2005-11-17 Thread Joe Conway


Tom Lane wrote:

Joe Conway [EMAIL PROTECTED] writes:
... My hope was that eventually anyarray I/O functions 
could eliminate the need to create an array type for every data type you 
wanted to use as an array element.


Interesting thought, but then how do you declare the type of an array
column, or the type of a function argument that's not supposed to range
over every array type?  If we can't use an OID to identify a data type
completely, we're going to have lots of problems.



You only really need two pieces of information to uniquely identify an 
array data type -- the OID of the (leaf-node) scalar elements, and the 
fact that what you have is an array.  Even if it is a nested structure 
of arrays, by recursing (max 5 times), you can eventually find the 
scalar elements. Last year I played around with this and had it 
partially working, but then got too busy to pursue it further.


Joe



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Some array semantics issues

2005-11-17 Thread Tom Lane

Joe Conway [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 If we can't use an OID to identify a data type
 completely, we're going to have lots of problems.

 You only really need two pieces of information to uniquely identify an 
 array data type -- the OID of the (leaf-node) scalar elements, and the 
 fact that what you have is an array.

Yes, but replacing one piece of data (type OID) with two (type OID plus
array flag bit) is going to be notationally horrible.  What's more it
will force knowledge about the existence of array types into a lot of
places that don't have to care right now.  For something that is
ultimately just a flange on the side of SQL, that doesn't seem like a
good tradeoff.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 regression=# select '[0:2]={1,2,3}'::int[] = '{1,2,3}'::int[];
  ?column?
 --
  t
 (1 row)
 
 regression=# select '{1,2,3,4}'::int[] = '{{1,2},{3,4}}'::int[];
  ?column?
 --
  t
 (1 row)
 
 This seems pretty bogus as well.  

The second case seems utterly bogus. But the first case seems maybe
justifiable. maybe. 

In the past Postgres treated the array bounds as so insignificant they weren't
even worth preserving across a dump/restore. 

And changing that would make it harder to test just the contents of the array
without having to match bounds as well. That is, You couldn't say list =
'{1,2}' to test if the array contained 1,2. You would have to, well, I'm not
even sure how you would test it actually. Maybe something kludgy like
'{}'::int[] || list = '{1,2}' ?

I'm not entirely against the idea of making array bounds significant but I
guess we would need some convenient way of taking them out of the picture too.
Perhaps another equality operator.

-- 
greg


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 And changing that would make it harder to test just the contents of the array
 without having to match bounds as well.

Fair point, but currently it's impossible to make a comparison that
*does* consider the bounds, which one would think would be the ordinary
meaning of equality.

 I'm not entirely against the idea of making array bounds significant
 but I guess we would need some convenient way of taking them out of
 the picture too.  Perhaps another equality operator.

I could go for a separate operator that has the current behavior
(might as well ignore number of dimensions too, if we're going to
ignore bounds).  Any thoughts about the operator name?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 I could go for a separate operator that has the current behavior
 (might as well ignore number of dimensions too, if we're going to
 ignore bounds).  Any thoughts about the operator name?

Well to me these are two different cases. At least the way I see it {1,2} is a
list of two numbers, and {{1,2,},{3,4}} is a list of two lists. They aren't
the same and they don't even contain the same thing.

It occurs to me that it would also make sense to have an operator that
considered the arrays in an order-insensitive comparison. 

It all depends on what you're using the arrays to represent.

If you're implementing something where each slot of the array corresponds to
some specific meaning then you need the array bounds included.

If you're representing stacks where the array bounds march up as they're used
then you don't really want to include the array bounds in your comparison.

If you're representing a denormalized one-to-many relationship (being aware of
all the associated pros and cons of denormalization of course) then you really
don't care about the order at all.

Personally I can't really think of any cases where the shape of the array
doesn't matter though.

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 I could go for a separate operator that has the current behavior
 (might as well ignore number of dimensions too, if we're going to
 ignore bounds).  Any thoughts about the operator name?

 Well to me these are two different cases. At least the way I see it {1,2} is a
 list of two numbers, and {{1,2,},{3,4}} is a list of two lists. They aren't
 the same and they don't even contain the same thing.

Well, in that case what do you think about
{{1,2},{3,4},{5,6},{7,8}}
vs
{{1,2,3,4},{5,6,7,8}}
These have the same physical contents and the same number of dimensions,
so unless you want to consider them equal, you have to consider the
dimension values.

I think what we may be dancing around here is that there are some cases
where it makes sense to ignore the lower bounds, as opposed to the axis
lengths.  I'm not convinced that there are any cases where it makes
sense to compare the number of dimensions without comparing the axis
lengths --- but I can see the argument that lower bounds might be
uninteresting, particularly seeing that array_push and and array_cat
do some not-necessarily-always-right things with them.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 Well, in that case what do you think about
   {{1,2},{3,4},{5,6},{7,8}}
 vs
   {{1,2,3,4},{5,6,7,8}}

In the first case the first element is {1,2} and in the second case the first
element is {1,2,3,4} so from my point of view there's no way these are the
same.

None of the three use cases I conjured would want them considered equal, which
isn't to say there isn't some data structure somewhere out there which would,
but I haven't thought of it.

-- 
greg


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Tom Lane

Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Well, in that case what do you think about
 {{1,2},{3,4},{5,6},{7,8}}
 vs
 {{1,2,3,4},{5,6,7,8}}

 In the first case the first element is {1,2} and in the second case the first
 element is {1,2,3,4} so from my point of view there's no way these are the
 same.

Well, then I think we're converging on agreement that array comparison
should always take into account the number of dimensions and the axis
lengths.  What seems still in question is whether to compare or ignore
the axis lower bounds.

I'd argue that ordinary equality should include the lower bounds, but
I'm willing to provide a separate operator (or whole btree opclass
if people want it) that ignores the lower bounds.  We just need a name.
Maybe ~=, ~, etc?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread David Fetter

On Wed, Nov 16, 2005 at 03:03:53PM -0500, Greg Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  I could go for a separate operator that has the current behavior
  (might as well ignore number of dimensions too, if we're going to
  ignore bounds).  Any thoughts about the operator name?
 
 Well to me these are two different cases. At least the way I see it
 {1,2} is a list of two numbers, and {{1,2,},{3,4}} is a list of two
 lists. They aren't the same and they don't even contain the same
 thing.

Right.

 It occurs to me that it would also make sense to have an operator
 that considered the arrays in an order-insensitive comparison. 

That sounds more like the SQL:2003 MULTISET, which is essentially
unordered.  Any plans for these?

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Tom Lane

David Fetter [EMAIL PROTECTED] writes:
 On Wed, Nov 16, 2005 at 03:03:53PM -0500, Greg Stark wrote:
 It occurs to me that it would also make sense to have an operator
 that considered the arrays in an order-insensitive comparison. 

 That sounds more like the SQL:2003 MULTISET, which is essentially
 unordered.  Any plans for these?

Seems to me it would be really expensive to try to make such a
comparison directly with the present array representation.  The
only sensible way to do it would be to sort the elements of the
two arrays (using the comparison operator of the element data type)
and then compare the results.  So you don't actually need a variant
equality operator, you just need a generic array_sort() function,
and then go array_sort(x) = array_sort(y).  Such a function might
have other uses besides this, too.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Joe Conway


Tom Lane wrote:

Greg Stark [EMAIL PROTECTED] writes:

Tom Lane [EMAIL PROTECTED] writes:


Well, in that case what do you think about
{{1,2},{3,4},{5,6},{7,8}}
vs
{{1,2,3,4},{5,6,7,8}}



In the first case the first element is {1,2} and in the second case the first
element is {1,2,3,4} so from my point of view there's no way these are the
same.


Well, then I think we're converging on agreement that array comparison
should always take into account the number of dimensions and the axis
lengths.  What seems still in question is whether to compare or ignore
the axis lower bounds.

I'd argue that ordinary equality should include the lower bounds, but
I'm willing to provide a separate operator (or whole btree opclass
if people want it) that ignores the lower bounds.  We just need a name.
Maybe ~=, ~, etc?


A couple of thoughts based on the last time I read SQL2003 WRT arrays.

First, the spec only allows arrays to have a lower bound of 1. That 
requirement simplifies a whole lot of things. I don't think that many 
people actually depend on other than 1 as a lower bound (or at least if 
they do, they weren't dumping and reloading those databases prior to 
8.0) -- maybe given other possibly non-backward compatible changes for 
NULLs, we should also change this?


Second, the spec does not really directly allow for multidimensional 
arrays. What it does allow is nesting of arrays. So as Greg states, 
{1,2} is clearly a different array than {1,2,3,4}. I had been thinking 
that when (if?) the array literal parser and related infrastructure is 
rewritten, it should be done so that arrays-as-array-elements are 
processed similar to any scalar element (and perhaps tuples as array 
elements as well). My hope was that eventually anyarray I/O functions 
could eliminate the need to create an array type for every data type you 
wanted to use as an array element.


Joe





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Tom Lane

Joe Conway [EMAIL PROTECTED] writes:
 First, the spec only allows arrays to have a lower bound of 1. That 
 requirement simplifies a whole lot of things. I don't think that many 
 people actually depend on other than 1 as a lower bound (or at least if 
 they do, they weren't dumping and reloading those databases prior to 
 8.0) -- maybe given other possibly non-backward compatible changes for 
 NULLs, we should also change this?

I don't have a lot of use for arguments that go we should remove any
functionality that's not in the spec ... ISTM that variable lower
bounds are clearly useful for some applications, and even if they had
bugs in earlier releases that's not an argument for removing them.

 ... My hope was that eventually anyarray I/O functions 
 could eliminate the need to create an array type for every data type you 
 wanted to use as an array element.

Interesting thought, but then how do you declare the type of an array
column, or the type of a function argument that's not supposed to range
over every array type?  If we can't use an OID to identify a data type
completely, we're going to have lots of problems.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 Joe Conway [EMAIL PROTECTED] writes:
  First, the spec only allows arrays to have a lower bound of 1. That 
  requirement simplifies a whole lot of things. I don't think that many 
  people actually depend on other than 1 as a lower bound (or at least if 
  they do, they weren't dumping and reloading those databases prior to 
  8.0) -- maybe given other possibly non-backward compatible changes for 
  NULLs, we should also change this?
 
 I don't have a lot of use for arguments that go we should remove any
 functionality that's not in the spec ... ISTM that variable lower
 bounds are clearly useful for some applications, and even if they had
 bugs in earlier releases that's not an argument for removing them.

Normally I don't either. But it's not just functionality that's not in the
spec. It's functionality that creates behaviour the spec specifies otherwise.

That is, if you have an array [1,2] the spec says you can get 1 by referring
to arr[1]. On Postgres you have to take more care. There could easily be code
out there that won't work on Postgres because of this difference.

The main reason for having non-zero lower bounds in the first place was
originally that NULLs weren't allowed in arrays. Otherwise you run into
problems when you try to set arr[5] = 1 when there isn't an arr[1]..arr[4].
But if we have NULLs in arrays then we could easily have all arrays have lower
bounds of 1. We don't even have to store the leading NULL elements.

I think having all arrays start at 1 would actually much simplify the
semantics and avoid a lot of strange corner cases and surprising behaviour
that will follow from having non-1 based arrays.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Some array semantics issues

2005-11-16 Thread Joe Conway


Greg Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:

Joe Conway [EMAIL PROTECTED] writes:
First, the spec only allows arrays to have a lower bound of 1. That 
requirement simplifies a whole lot of things. I don't think that many 
people actually depend on other than 1 as a lower bound (or at least if 
they do, they weren't dumping and reloading those databases prior to 
8.0) -- maybe given other possibly non-backward compatible changes for 
NULLs, we should also change this?


I don't have a lot of use for arguments that go we should remove any
functionality that's not in the spec ... ISTM that variable lower
bounds are clearly useful for some applications, and even if they had
bugs in earlier releases that's not an argument for removing them.


Normally I don't either. But it's not just functionality that's not in the
spec. It's functionality that creates behaviour the spec specifies otherwise.


This is an important point. SQL2003 doesn't leave this as undefined:

4.10.2 Arrays
An array is a collection A in which each element is associated with 
exactly one ordinal position in A. If n is the cardinality of A, then 
the ordinal position p of an element is an integer in the range 1 (one) 
= p = n. If EDT is the element type of A, then A can thus be 
considered as a function of the integers in the range 1 (one) to n into EDT.


Joe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

Re: [HACKERS] Some array semantics issues

19 matches

Site Navigation

Mail list logo

Footer information