Re: index unique

2021-06-12 Thread Peter J. Holzer
On 2021-06-11 14:37:57 +0200, Marc Millas wrote:
> anyway, my original post was about the fact that we must create a PK based on
> some kind of work around the limited btree length behaviour thus creating an
> index that will never be of any use, just for technical reasons.

Actually, that index will be used, and it will be used every time you
update, insert, or delete a row - by the logical replication system
which has to identify the affected row(s) on the secondary system.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: index unique

2021-06-11 Thread Marc Millas
Hi Alban,

I plainly agree on the uniqueness thing. and on the fact that a PK with
only a geometry column can be considered somehow "ill suited".
That said, the PK we finally use contains, as said, 3 columns:
--an id (integer column)
--a topology describer (the text column)
--and the geometry column.

this set IS unique in our dataset. and helps provide some quality checks
(wrong id, non coherent tolology,. and some geometry errors..)

anyway, my original post was about the fact that we must create a PK based
on some kind of work around the limited btree length behaviour thus
creating an index that will never be of any use, just for technical reasons.
People in this list have been quite clear that the btree limitation will
NOT change. Which ends the point.

regards

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Thu, Jun 10, 2021 at 10:33 PM Alban Hertroys  wrote:

>
> > On 8 Jun 2021, at 22:50, Thomas Kellerer  wrote:
> >
> > Marc Millas schrieb am 03.06.2021 um 22:51:
> >> on a table we need a primary key and to get a unique combinaison, we
> need 3 columns of that table:
> >> 1 of type integer,
> >> 1 of type text,
> >> 1 of type geometry
> >>
> >
> > How do you define the "uniqueness" of the geometry?
>
> That is actually the big question here. Multiple “unique” geometries can
> specify the same geometry!
>
> A geom as simple as a line from (0,0) - (1,0) can just as easily be
> specified as (1,0) - (0,0). That’s the simplest case, and one could argue
> that the point of origin is different, but the next example would be a
> triangle starting at the same origin but traversed in different directions.
> It gets harder the more vertices a polygon has.
>
> I would argue that a geometry type is ill-suited as a primary key column
> candidate.
>
> Now, of course, the OP could have a case where their geometries are
> guaranteed to be unique regardless, but they’d better make sure before
> adding them to the PK.
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
>
>


Re: index unique

2021-06-10 Thread Alban Hertroys


> On 8 Jun 2021, at 22:50, Thomas Kellerer  wrote:
> 
> Marc Millas schrieb am 03.06.2021 um 22:51:
>> on a table we need a primary key and to get a unique combinaison, we need 3 
>> columns of that table:
>> 1 of type integer,
>> 1 of type text,
>> 1 of type geometry
>> 
> 
> How do you define the "uniqueness" of the geometry?

That is actually the big question here. Multiple “unique” geometries can 
specify the same geometry!

A geom as simple as a line from (0,0) - (1,0) can just as easily be specified 
as (1,0) - (0,0). That’s the simplest case, and one could argue that the point 
of origin is different, but the next example would be a triangle starting at 
the same origin but traversed in different directions. It gets harder the more 
vertices a polygon has.

I would argue that a geometry type is ill-suited as a primary key column 
candidate.

Now, of course, the OP could have a case where their geometries are guaranteed 
to be unique regardless, but they’d better make sure before adding them to the 
PK.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: index unique

2021-06-10 Thread Marc Millas
Thanks Thomas,

but, as stated after the first post, the need was for a PK as asked by
postgres (ie. for tech needs, not for functionnal needs)
up to now, looks like we must create a PK (and so, the associated index)
just to answer logical replication needs.(and qgis which also needs a PK)
that index  (some kind of hash on the geom column + the other 2).  have no
meaning on a functionnal point of view, and there are chances that it will
never be used by postgres  in normal use..
as the hash will not help on any topological request...

that was the reason of the first mail: as we must create a PK, is there any
way to make something useful and not this unuseful "thing" ?




Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Tue, Jun 8, 2021 at 10:51 PM Thomas Kellerer  wrote:

> Marc Millas schrieb am 03.06.2021 um 22:51:
> > on a table we need a primary key and to get a unique combinaison, we
> need 3 columns of that table:
> > 1 of type integer,
> > 1 of type text,
> > 1 of type geometry
> >
> > creating the PK constraint doesn work: (even with our current small data
> set)
> > ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for
> index "xxx_spkey"
> > DETAIL:  Index row references tuple (32,1) in relation "xxx".
> > HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> > Consider a function index of an MD5 hash of the value, or use full text
> indexing.
> >
> > ok. we can do this.
> > but if so, we need to create a gist index on the geometry column to do
> any topology request.
> > so 2 indexes containing this single column.
> >
> > if we install extension btree_gist, no pb to create an index on all 3
> columns.
> > but as gist does not support unicity, this index cannot be used for the
> PK.
> >
> > OK, we may try to use a function to get the bounding box around the
> geometry objects and use the result into a btree index
> >
> > Any idea (I mean: another idea !) to tackle this ?
> > Or any critic on the "solution" ??
>
> How do you define the "uniqueness" of the geometry?
>
> GIST can support "uniqueness" through exclusion constraints.
> It's not a primary key, so you can't create foreign keys referencing that
> table,
> but it does ensure uniqueness (In fact the "normal" unique indexes are
> essentially a special case of exclusion constraints)
>
> create index on the_table using gist (int_column with =, text_col with
> =, geometry_col with &&);
>
> Replace the && operator with whatever is appropriate for your use case.
>
> Thomas
>
>
>


Re: index unique

2021-06-08 Thread Thomas Kellerer

Marc Millas schrieb am 03.06.2021 um 22:51:

on a table we need a primary key and to get a unique combinaison, we need 3 
columns of that table:
1 of type integer,
1 of type text,
1 of type geometry

creating the PK constraint doesn work: (even with our current small data set)
ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index 
"xxx_spkey"
DETAIL:  Index row references tuple (32,1) in relation "xxx".
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text 
indexing.

ok. we can do this.
but if so, we need to create a gist index on the geometry column to do any 
topology request.
so 2 indexes containing this single column.

if we install extension btree_gist, no pb to create an index on all 3 columns.
but as gist does not support unicity, this index cannot be used for the PK.

OK, we may try to use a function to get the bounding box around the geometry 
objects and use the result into a btree index

Any idea (I mean: another idea !) to tackle this ?
Or any critic on the "solution" ??


How do you define the "uniqueness" of the geometry?

GIST can support "uniqueness" through exclusion constraints.
It's not a primary key, so you can't create foreign keys referencing that table,
but it does ensure uniqueness (In fact the "normal" unique indexes are
essentially a special case of exclusion constraints)

   create index on the_table using gist (int_column with =, text_col with =, 
geometry_col with &&);

Replace the && operator with whatever is appropriate for your use case.

Thomas




Re: index unique

2021-06-08 Thread Tom Lane
Marc Millas  writes:
> I know, its clearly stated in postgres doc about btree," The only
> limitation is that an index entry cannot exceed approximately one-third of
> a page (after TOAST compression, if applicable)."

Yup.

> is there any plan to adress this ?

No.  The cost/benefit ratio seems completely untenable.

The usual recommendation is that you shorten long values by hashing
them, eg create a unique index on md5(my_long_column).

regards, tom lane




Re: index unique

2021-06-08 Thread David G. Johnston
On Tuesday, June 8, 2021, Marc Millas  wrote:

>
> but as this looks quite hard coded, it means that for  long utf8 things
> the data length is not so long before hitting the limit.
>
> is there any plan to adress this ?
>

None that I’ve seen, and I don’t expect to see one either.  Mainly because
I’ve yet to see or think of a use case that would warrant even considering
doing such, ignoring the fact that actually changing such a value is likely
to be a non-starter from a compatability perspective.

David J.


Re: index unique

2021-06-08 Thread Marc Millas
Cristal clear !
and it have to be the case as my test was done with some not so random data.

but this mean that we cannot put a bunch of datatypes in a PK, as soon as
it may be longer than 2701.
I know, its clearly stated in postgres doc about btree," The only
limitation is that an index entry cannot exceed approximately one-third of
a page (after TOAST compression, if applicable)."
but as this looks quite hard coded, it means that for  long utf8 things the
data length is not so long before hitting the limit.

is there any plan to adress this ?
kind of index toast ?? :-)

thanks


Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Tue, Jun 8, 2021 at 8:42 PM Peter J. Holzer  wrote:

> On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
> > the only pb observed is the size of the object accepted. if the geom is
> a bit
> > "big" then the index errors.about btree size of index object.
> > but if I create a table test_l with a text column blabla as a PK, and
> insert a
> > 100 000 character long string , no pb.
> > if I do an explain analyze select blabla from test_l order by blabla,
> postgres
> > tells me that it did an index only scan.
> > so, I wonder why Postgres is able to put a 100 000 long text in a btree
> index
> > and NOT a geom column which wkt is 10 000 bytes long.
>
> Good question. Maybe your texts compresses better than your geometries:
>
> hjp=> insert into t(t) values(repeat('a', 235327) || '1');
> INSERT 0 1
> Time: 60.057 ms
> hjp=> insert into t(t) values(repeat('a', 235328) || '1');
> ERROR:  index row size 2720 exceeds maximum 2712 for index "t_pkey1"
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text
> indexing.
> Time: 58.751 ms
>
> Note the difference between the length of the string I was trying to
> insert and the length of the row it complains about.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: index unique

2021-06-08 Thread Peter J. Holzer
On 2021-06-08 18:30:16 +0200, Marc Millas wrote:
> the only pb observed is the size of the object accepted. if the geom is a bit
> "big" then the index errors.about btree size of index object.
> but if I create a table test_l with a text column blabla as a PK, and insert a
> 100 000 character long string , no pb.
> if I do an explain analyze select blabla from test_l order by blabla, postgres
> tells me that it did an index only scan.
> so, I wonder why Postgres is able to put a 100 000 long text in a btree index
> and NOT a geom column which wkt is 10 000 bytes long.

Good question. Maybe your texts compresses better than your geometries:

hjp=> insert into t(t) values(repeat('a', 235327) || '1');
INSERT 0 1
Time: 60.057 ms
hjp=> insert into t(t) values(repeat('a', 235328) || '1');
ERROR:  index row size 2720 exceeds maximum 2712 for index "t_pkey1"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text 
indexing.
Time: 58.751 ms

Note the difference between the length of the string I was trying to
insert and the length of the row it complains about.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: index unique

2021-06-08 Thread Marc Millas
quite funny to see how a tech question seems to end into an english grammar
thing :-)

quote
> You make this sound like an either-or proposition,

While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).

There can be several unique keys, though.

> but personally it takes a very exceptional circumstance to forgo
> defining a unique natural key.

True, but not what he's talking about.
end quote


when I did write the question, I write: "on a table we need a primary key"
its quite obvious that there is only one PK, but, at that stage, I
dont need one for functional reasons
so, I need "a" primary key,  whatever it is, just because  postgres logical
replication needs one to accept updates.

then, to build "the" needed PK, we need to decide which columns or set of
will be appropriate
as it has to be able to identify something unique. and this is how we get
to this list of 3 columns.

So, ok, this set doesn't work asis.
all solutions we thought of are just tech workaround, providing no
functionnal meaning.

still, its somehow surprising: (at least to me !)
postgres has no problem creating a btree for a geometry column, and, so, no
problem for putting a geom column in a PK.
(very different from Oracle...)


the only pb observed is the size of the object accepted. if the geom is a
bit "big" then the index errors.about btree size of index object.
but if I create a table test_l with a text column blabla as a PK, and
insert a 100 000 character long string , no pb.
if I do an explain analyze select blabla from test_l order by blabla,
postgres tells me that it did an index only scan.
so, I wonder why Postgres is able to put a 100 000 long text in a btree
index and NOT a geom column which wkt is 10 000 bytes long.

there, sure, is a reason.

please, let me know :-)





Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Tue, Jun 8, 2021 at 12:09 AM Peter J. Holzer  wrote:

> On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
> > On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer 
> wrote:
> > On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> > > postgres 12 with postgis.
> > > on a table we need a primary key and to get a unique combinaison,
> we need
> > > 3 columns of that table:
> > > 1 of type integer,
> > > 1 of type text,
> > > 1 of type geometry
> > >
> > > creating the PK constraint doesn work:
> >
> > I find that if a natural primary key candidate is so complex, it is
> > usually better to use a surrogate key.
> >
> >
> > You make this sound like an either-or proposition,
>
> While he is talking about *a* primary key, it should be *the* primary
> key. There can be only one (that's why it is the primary key).
>
> There can be several unique keys, though.
>
> > but personally it takes a very exceptional circumstance to forgo
> > defining a unique natural key.
>
> True, but not what he's talking about.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>


Re: index unique

2021-06-07 Thread Peter J. Holzer
On 2021-06-07 10:20:22 -0700, David G. Johnston wrote:
> On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer  wrote:
> On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> > postgres 12 with postgis.
> > on a table we need a primary key and to get a unique combinaison, we 
> need
> > 3 columns of that table:
> > 1 of type integer,
> > 1 of type text,
> > 1 of type geometry
> >
> > creating the PK constraint doesn work:
> 
> I find that if a natural primary key candidate is so complex, it is
> usually better to use a surrogate key.
> 
> 
> You make this sound like an either-or proposition,

While he is talking about *a* primary key, it should be *the* primary
key. There can be only one (that's why it is the primary key).

There can be several unique keys, though.

> but personally it takes a very exceptional circumstance to forgo
> defining a unique natural key.

True, but not what he's talking about.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: index unique

2021-06-07 Thread David G. Johnston
On Sun, Jun 6, 2021 at 11:55 PM Peter J. Holzer  wrote:

> On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> > postgres 12 with postgis.
> > on a table we need a primary key and to get a unique combinaison, we
> need 3
> > columns of that table:
> > 1 of type integer,
> > 1 of type text,
> > 1 of type geometry
> >
> > creating the PK constraint doesn work:
>
> I find that if a natural primary key candidate is so complex, it is
> usually better to use a surrogate key.
>

You make this sound like an either-or proposition, but personally it takes
a very exceptional circumstance to forgo defining a unique natural key.
Whether I choose to supplement that with a surrogate key is a different
matter altogether.

In this case identity would seem to make more sense using labels, not
composition.  For instance, on a substance table I would have the
identifier for water be "H20", and have the chemical composition of water
be "H20" (not the best example...).  In this case ensuring uniqueness of
the formula-to-label dependency would be trivial to implement and I
probably would too - but in the case of geometry I'd just accept that using
an index to do this would not be possible and, if I really needed
reassurance of geometry uniqueness, I would do so in triggers.

David J.


Re: index unique

2021-06-07 Thread Peter J. Holzer
On 2021-06-03 22:51:55 +0200, Marc Millas wrote:
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3
> columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
> 
> creating the PK constraint doesn work:

I find that if a natural primary key candidate is so complex, it is
usually better to use a surrogate key. 

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature


Re: index unique

2021-06-03 Thread Marc Millas
Hi Paul,

obviously the bounding box is not a perfect choice. we also think of one of
the "centers" point, but quite similar non uniqueness.

so, if no "tech" solution, we continue to work with the business to try to
find an appropriate PK

thanks

Marc MILLAS
Senior Architect
+33607850334
www.mokadb.com



On Thu, Jun 3, 2021 at 10:57 PM Paul Ramsey 
wrote:

> Primary key is going to be a BTREE index. I'm surprised you require the
> geometry in order to achieve uniqueness?
> You can't put the geometry into a BTREE because it's too large.
> You could add a column and stick the MD5 hash of the geometry there, and
> use that as the last piece of uniqueness?
> If the bbox of the geometry is "good enough" for your key, that kind of
> speaks to the idea that maybe your geometry doesn't need to be part of the
> PK? It's possible for different geometries to have the same bbox...
> ATB,
> P
>
> > On Jun 3, 2021, at 1:51 PM, Marc Millas  wrote:
> >
> > Hi,
> > postgres 12 with postgis.
> > on a table we need a primary key and to get a unique combinaison, we
> need 3 columns of that table:
> > 1 of type integer,
> > 1 of type text,
> > 1 of type geometry
> >
> > creating the PK constraint doesn work: (even with our current small data
> set)
> > ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for
> index "xxx_spkey"
> > DETAIL:  Index row references tuple (32,1) in relation "xxx".
> > HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> > Consider a function index of an MD5 hash of the value, or use full text
> indexing.
> >
> > ok. we can do this.
> > but if so, we need to create a gist index on the geometry column to do
> any topology request.
> > so 2 indexes containing this single column.
> >
> > if we install extension btree_gist, no pb to create an index on all 3
> columns.
> > but as gist does not support unicity, this index cannot be used for the
> PK.
> >
> > OK, we may try to use a function to get the bounding box around the
> geometry objects and use the result into a btree index
> >
> > Any idea (I mean: another idea !) to tackle this ?
> > Or any critic on the "solution" ??
> >
> > thanks,
> >
> >
> > Marc MILLAS
> > Senior Architect
> > +33607850334
> > www.mokadb.com
> >
>
>


Re: index unique

2021-06-03 Thread Paul Ramsey
Primary key is going to be a BTREE index. I'm surprised you require the 
geometry in order to achieve uniqueness?
You can't put the geometry into a BTREE because it's too large. 
You could add a column and stick the MD5 hash of the geometry there, and use 
that as the last piece of uniqueness?
If the bbox of the geometry is "good enough" for your key, that kind of speaks 
to the idea that maybe your geometry doesn't need to be part of the PK? It's 
possible for different geometries to have the same bbox...
ATB,
P

> On Jun 3, 2021, at 1:51 PM, Marc Millas  wrote:
> 
> Hi,
> postgres 12 with postgis.
> on a table we need a primary key and to get a unique combinaison, we need 3 
> columns of that table:
> 1 of type integer,
> 1 of type text,
> 1 of type geometry
> 
> creating the PK constraint doesn work: (even with our current small data set)
> ERROR:  index row size 6072 exceeds btree version 4 maximum 2704 for index 
> "xxx_spkey"
> DETAIL:  Index row references tuple (32,1) in relation "xxx".
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text 
> indexing.
> 
> ok. we can do this.
> but if so, we need to create a gist index on the geometry column to do any 
> topology request.
> so 2 indexes containing this single column.
> 
> if we install extension btree_gist, no pb to create an index on all 3 columns.
> but as gist does not support unicity, this index cannot be used for the PK.
> 
> OK, we may try to use a function to get the bounding box around the geometry 
> objects and use the result into a btree index
> 
> Any idea (I mean: another idea !) to tackle this ?
> Or any critic on the "solution" ??
> 
> thanks,
> 
> 
> Marc MILLAS
> Senior Architect
> +33607850334
> www.mokadb.com
>