Re: [PATCH] ltree hash functions

2024-03-21 Thread Tom Lane
jian he  writes:
> I also made the following changes:
> from

> uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, 
> seed);
> uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

> to
> uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
> al->name, al->len, seed));
> uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
> al->len));

Yeah, that'd fail on 32-bit machines.

Pushed v5 with some minor cosmetic tweaking.

regards, tom lane




Re: [PATCH] ltree hash functions

2024-02-04 Thread jian he
On Thu, Feb 1, 2024 at 11:11 PM vignesh C  wrote:
>
> On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek  wrote:
> >
> > Thanks.
> >
> > I've attached the latest version that updates the naming in line with
> > the convention.
> >
> > On Mon, Dec 4, 2023 at 12:46 AM jian he  wrote:
> > >
> > > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  
> > > wrote:
> > > >
> > > >
> > > > Patch updated for those comments (and a touch of cleanup in the tests) 
> > > > attached.
> > >
> > > it would be a better name as hash_ltree than ltree_hash, similar logic
> > > applies to ltree_hash_extended.
> > > that would be the convention. see: 
> > > https://stackoverflow.com/a/69650940/15603477
> > >
> > >
> > > Other than that, it looks good.
>
> CFBot shows one of the test is failing as in [1]:
> diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
> /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> --- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
> 15:18:42.893039599 +
> +++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
> 2024-01-31 15:23:25.309028749 +
> @@ -1442,9 +1442,14 @@
> ('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
>  WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
> OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
> - value | standard | extended0 | extended1
> +--+---+---
> -(0 rows)
> +value| standard |
> extended0 |extended1
> +-+--+--+--
> + 0   | 10001010100010011011 |
> 0101100001000100011001011011 | 010110000100010001101001
> + 0.1 | 101010001010110001001110 |
> 0000100010001100110111010101 | 0000100010001101100011010101
> + 0.1.2   | 0111010101110100 |
> 1010111001110101100011010111 | 101011100111011100100011
> + 0   | 10001010100010011011 |
> 0101100001000100011001011011 | 010110000100010001101001
> + 0_asd.1_ASD | 011000101000101001001101 |
> 0000100010001100110111010101 | 0000100010001101100011010101
> +(5 rows)
>
> Please post an updated version for the same.
>
> [1] - 
> https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs
>

It only fails on Linux - Debian Bullseye - Meson.
I fixed the white space, named it v5.
I also made the following changes:
from

uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, seed);
uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

to
uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
al->name, al->len, seed));
uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
al->len));

(these two line live in different functions)

I have some problems testing it locally, so I post the patch.
From ffc0e3daea5e490d8b9e6b7a0345f780d88b5624 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Fri, 2 Feb 2024 16:03:56 +0800
Subject: [PATCH v5 1/1] add hash functions for the ltree extension.

---
 contrib/ltree/Makefile|  2 +-
 contrib/ltree/expected/ltree.out  | 65 +
 contrib/ltree/ltree--1.2--1.3.sql | 23 +++
 contrib/ltree/ltree.control   |  2 +-
 contrib/ltree/ltree_op.c  | 68 +++
 contrib/ltree/ltreetest.sql   |  1 +
 contrib/ltree/meson.build |  1 +
 contrib/ltree/sql/ltree.sql   | 43 +++
 doc/src/sgml/ltree.sgml   |  8 
 9 files changed, 211 insertions(+), 2 deletions(-)
 create mode 100644 contrib/ltree/ltree--1.2--1.3.sql

diff --git a/contrib/ltree/Makefile b/contrib/ltree/Makefile
index 770769a7..f50f2c94 100644
--- a/contrib/ltree/Makefile
+++ b/contrib/ltree/Makefile
@@ -14,7 +14,7 @@ OBJS = \
 	ltxtquery_op.o
 
 EXTENSION = ltree
-DATA = ltree--1.1--1.2.sql ltree--1.1.sql ltree--1.0--1.1.sql
+DATA = ltree--1.2--1.3.sql ltree--1.1--1.2.sql ltree--1.1.sql ltree--1.0--1.1.sql
 PGFILEDESC = "ltree - hierarchical label data type"
 
 HEADERS = ltree.h
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 984cd030..dd4f0fc1 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1433,8 +1433,27 @@ SELECT '{j.k.l.m, g.b.c.d.e}'::ltree[] ?~ 'A*@|g.b.c.d.e';
  g.b.c.d.e
 (1 row)
 
+-- Check that the hash_ltree() and hash_ltree_extended() function's lower
+-- 32 bits match when the seed is 0 and do not match when the seed != 0
+SELECT v as value, hash_ltree(v)::bit(32) as standard,
+   hash_ltree_extended(v, 0)::bit(32) as extended0,
+   hash_ltree_extended(v, 1)::bit(32) as extended1
+FROM   (VALUES (NULL::ltree), (''::ltree), ('0'::ltree), ('0.1'::ltree),
+   ('0.1.2'::ltree), ('0':

Re: [PATCH] ltree hash functions

2024-02-01 Thread vignesh C
On Wed, 6 Dec 2023 at 04:08, Tommy Pavlicek  wrote:
>
> Thanks.
>
> I've attached the latest version that updates the naming in line with
> the convention.
>
> On Mon, Dec 4, 2023 at 12:46 AM jian he  wrote:
> >
> > On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  wrote:
> > >
> > >
> > > Patch updated for those comments (and a touch of cleanup in the tests) 
> > > attached.
> >
> > it would be a better name as hash_ltree than ltree_hash, similar logic
> > applies to ltree_hash_extended.
> > that would be the convention. see: 
> > https://stackoverflow.com/a/69650940/15603477
> >
> >
> > Other than that, it looks good.

CFBot shows one of the test is failing as in [1]:
diff -U3 /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out
/tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
--- /tmp/cirrus-ci-build/contrib/ltree/expected/ltree.out 2024-01-31
15:18:42.893039599 +
+++ /tmp/cirrus-ci-build/build-32/testrun/ltree/regress/results/ltree.out
2024-01-31 15:23:25.309028749 +
@@ -1442,9 +1442,14 @@
('0.1.2'::ltree), ('0'::ltree), ('0_asd.1_ASD'::ltree)) x(v)
 WHERE  hash_ltree(v)::bit(32) != hash_ltree_extended(v, 0)::bit(32)
OR hash_ltree(v)::bit(32) = hash_ltree_extended(v, 1)::bit(32);
- value | standard | extended0 | extended1
+--+---+---
-(0 rows)
+value| standard |
extended0 |extended1
+-+--+--+--
+ 0   | 10001010100010011011 |
0101100001000100011001011011 | 010110000100010001101001
+ 0.1 | 101010001010110001001110 |
0000100010001100110111010101 | 0000100010001101100011010101
+ 0.1.2   | 0111010101110100 |
1010111001110101100011010111 | 101011100111011100100011
+ 0   | 10001010100010011011 |
0101100001000100011001011011 | 010110000100010001101001
+ 0_asd.1_ASD | 011000101000101001001101 |
0000100010001100110111010101 | 0000100010001101100011010101
+(5 rows)

Please post an updated version for the same.

[1] - 
https://api.cirrus-ci.com/v1/artifact/task/5572544858685440/testrun/build-32/testrun/ltree/regress/regression.diffs

Regards,
Vignesh




Re: [PATCH] ltree hash functions

2023-12-05 Thread Tommy Pavlicek
Thanks.

I've attached the latest version that updates the naming in line with
the convention.

On Mon, Dec 4, 2023 at 12:46 AM jian he  wrote:
>
> On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  wrote:
> >
> >
> > Patch updated for those comments (and a touch of cleanup in the tests) 
> > attached.
>
> it would be a better name as hash_ltree than ltree_hash, similar logic
> applies to ltree_hash_extended.
> that would be the convention. see: 
> https://stackoverflow.com/a/69650940/15603477
>
>
> Other than that, it looks good.


0001-ltree-hash-v4.patch
Description: Binary data


Re: [PATCH] ltree hash functions

2023-12-03 Thread jian he
On Fri, Dec 1, 2023 at 8:44 AM Tommy Pavlicek  wrote:
>
>
> Patch updated for those comments (and a touch of cleanup in the tests) 
> attached.

it would be a better name as hash_ltree than ltree_hash, similar logic
applies to ltree_hash_extended.
that would be the convention. see: https://stackoverflow.com/a/69650940/15603477


Other than that, it looks good.




Re: [PATCH] ltree hash functions

2023-11-30 Thread Tommy Pavlicek
On Tue, Nov 28, 2023 at 7:38 PM jian he  wrote:
> you only change Makefile, you also need to change contrib/ltree/meson.build?
> Do you need to use EXPLAIN to demo the index usage?

Thanks! Yes, I missed the Meson build file. I added additional
commands with EXPLAIN (COSTS OFF) as I found in other places.

Patch updated for those comments (and a touch of cleanup in the tests) attached.


0001-ltree-hash-v3.patch
Description: Binary data


Re: [PATCH] ltree hash functions

2023-11-28 Thread jian he
On Wed, Nov 29, 2023 at 6:09 AM Tommy Pavlicek  wrote:
>
> On Thu, Jul 6, 2023 at 2:18 AM Daniel Gustafsson  wrote:
> >
> > > On 19 Jun 2023, at 11:18, Tommy Pavlicek  wrote:
> >
> > > Tommy, are you interested in extending ALTER OPERATOR to allow this,
> > > which would also allow fixing the ltree operator?
> > >
> > > Yes, I can do that. I took a look over the code and email thread and it 
> > > seems like it should be relatively straight forward. I'll put a patch 
> > > together for that and then update this patch to alter the operator.
> >
> > Did you have a chance to look at this for an updated patch for this 
> > commitfest?
>
> I finally had a chance to look at this and I've updated the patch to
> alter the = operator to enable hash joins.
>
> This is ready to be looked at now.
>
> Is there anything I need to do to move this forward?
>

you only change Makefile, you also need to change contrib/ltree/meson.build?

+drop index tstidx;
+create index tstidx on ltreetest using hash (t);
+set enable_seqscan=off;
+
+SELECT * FROM ltreetest WHERE t =  '12.3' order by t asc;

Do you need to use EXPLAIN to demo the index usage?




Re: [PATCH] ltree hash functions

2023-11-28 Thread Tommy Pavlicek
On Thu, Jul 6, 2023 at 2:18 AM Daniel Gustafsson  wrote:
>
> > On 19 Jun 2023, at 11:18, Tommy Pavlicek  wrote:
>
> > Tommy, are you interested in extending ALTER OPERATOR to allow this,
> > which would also allow fixing the ltree operator?
> >
> > Yes, I can do that. I took a look over the code and email thread and it 
> > seems like it should be relatively straight forward. I'll put a patch 
> > together for that and then update this patch to alter the operator.
>
> Did you have a chance to look at this for an updated patch for this 
> commitfest?

I finally had a chance to look at this and I've updated the patch to
alter the = operator to enable hash joins.

This is ready to be looked at now.

Is there anything I need to do to move this forward?

Cheers,
Tommy


0001-ltree-hash-v2.patch
Description: Binary data


Re: [PATCH] ltree hash functions

2023-07-06 Thread Daniel Gustafsson
> On 19 Jun 2023, at 11:18, Tommy Pavlicek  wrote:

> Tommy, are you interested in extending ALTER OPERATOR to allow this,
> which would also allow fixing the ltree operator?
> 
> Yes, I can do that. I took a look over the code and email thread and it seems 
> like it should be relatively straight forward. I'll put a patch together for 
> that and then update this patch to alter the operator.

Did you have a chance to look at this for an updated patch for this commitfest?

--
Daniel Gustafsson





Re: [PATCH] ltree hash functions

2023-06-19 Thread Tommy Pavlicek
>
> FWIW the CREATE OPERATOR documentation only talks about hash joins for

HASHES, maybe it should be updated to also mention hash aggregates?


I think I might have been a bit unclear here, the hash aggregate does work
without altering the operator so it's just the join that's blocked. Sorry
about the confusion.

I wonder what's the use case for this. I wonder how often people join on
> ltree, for example. Did you just notice ltree can't hash and decided to
> fix that, or do you have a practical use case / need for this feature?


I mostly want to add hash indexes. Beyond selecting specific values, you
can use them to get ancestors (trim the path and do an exact select) and
descendents (using a functional index calculating the parent path for each
row). For example, I've found it can be faster to calculate the path of
every ancestor and use select ltree path = ANY([ancestor paths]) compared
to using a gist index. It's not ideal, but unfortunately I've found that
with enough rows, gist indexes get very large and slow. Btree indexes are
better, but for ltree they can still be up to around 10x bigger than a hash
index. I've also seen ltree hash indexes outperform btree indexes in very
large tables, but I suspect in most cases they'll be similar.

Tommy, are you interested in extending ALTER OPERATOR to allow this,
> which would also allow fixing the ltree operator?


Yes, I can do that. I took a look over the code and email thread and it
seems like it should be relatively straight forward. I'll put a patch
together for that and then update this patch to alter the operator.

On Sat, Jun 17, 2023 at 9:57 PM Tomas Vondra 
wrote:

>
>
> On 6/17/23 20:19, Tom Lane wrote:
> > Tomas Vondra  writes:
> >> I guess the "correct" solution would be to extend ALTER OPERATOR. I
> >> wonder why it's not supported - it's clearly an intentional decision
> >> (per comment in AlterOperator). So what might break if this changes for
> >> an existing operator?
> >
> > This code was added by commit 321eed5f0.  The thread leading up to
> > that commit is here:
> >
> > https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell
> >
> > There are some nontrivial concerns in there about breaking the
> > semantics of existing exclusion constraints, for instance.  I think
> > we mostly rejected the concern about invalidation of cached plans
> > as already-covered, but that wasn't the only problem.
> >
> > However, I think we could largely ignore the issues if we restricted
> > ALTER OPERATOR to only add commutator, negator, hashes, or merges
> > properties to operators that lacked them before --- which'd be the
> > primary if not only use-case anyway.  That direction can't break
> > anything.
> >
>
> Sound reasonable.
>
> Tommy, are you interested in extending ALTER OPERATOR to allow this,
> which would also allow fixing the ltree operator?
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [PATCH] ltree hash functions

2023-06-17 Thread Tomas Vondra



On 6/17/23 20:19, Tom Lane wrote:
> Tomas Vondra  writes:
>> I guess the "correct" solution would be to extend ALTER OPERATOR. I
>> wonder why it's not supported - it's clearly an intentional decision
>> (per comment in AlterOperator). So what might break if this changes for
>> an existing operator?
> 
> This code was added by commit 321eed5f0.  The thread leading up to
> that commit is here:
> 
> https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell
> 
> There are some nontrivial concerns in there about breaking the
> semantics of existing exclusion constraints, for instance.  I think
> we mostly rejected the concern about invalidation of cached plans
> as already-covered, but that wasn't the only problem.
> 
> However, I think we could largely ignore the issues if we restricted
> ALTER OPERATOR to only add commutator, negator, hashes, or merges
> properties to operators that lacked them before --- which'd be the
> primary if not only use-case anyway.  That direction can't break
> anything.
> 

Sound reasonable.

Tommy, are you interested in extending ALTER OPERATOR to allow this,
which would also allow fixing the ltree operator?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] ltree hash functions

2023-06-17 Thread Tom Lane
Tomas Vondra  writes:
> I guess the "correct" solution would be to extend ALTER OPERATOR. I
> wonder why it's not supported - it's clearly an intentional decision
> (per comment in AlterOperator). So what might break if this changes for
> an existing operator?

This code was added by commit 321eed5f0.  The thread leading up to
that commit is here:

https://www.postgresql.org/message-id/flat/3348985.V7xMLFDaJO%40dinodell

There are some nontrivial concerns in there about breaking the
semantics of existing exclusion constraints, for instance.  I think
we mostly rejected the concern about invalidation of cached plans
as already-covered, but that wasn't the only problem.

However, I think we could largely ignore the issues if we restricted
ALTER OPERATOR to only add commutator, negator, hashes, or merges
properties to operators that lacked them before --- which'd be the
primary if not only use-case anyway.  That direction can't break
anything.

regards, tom lane




Re: [PATCH] ltree hash functions

2023-06-17 Thread Tomas Vondra
Hi,

I've created a CF entry for the patch:

  https://commitfest.postgresql.org/43/4375/

I only briefly skimmed the code, so a couple comments.

On 6/17/23 17:45, Tommy Pavlicek wrote:
> Hi All,
> 
> I've written a patch to add hash functions for the ltree extension. It
> adds support for hash indexes and hash aggregation. I've reused the
> existing logic that's used to hash arrays and added tests that mirror
> elsewhere (i.e. hstore and hash_func regression tests).
> 

Reusing code/logic is the right approach, IMHO.

> The patch doesn't currently support hash joins as the ltree = operator
> was created without support for it. The ALTER OPERATOR command doesn't
> support changing the hash join support, so I'm not sure what the best
> strategy to change it is. Is it ok to update the operator's row in the
> pg_operator system catalog or is there a better way to change this that
> someone could recommend?
> 

I guess the "correct" solution would be to extend ALTER OPERATOR. I
wonder why it's not supported - it's clearly an intentional decision
(per comment in AlterOperator). So what might break if this changes for
an existing operator?

FWIW the CREATE OPERATOR documentation only talks about hash joins for
HASHES, maybe it should be updated to also mention hash aggregates?

> Any comments on the overall approach or other feedback would be appreciated.
> 

I wonder what's the use case for this. I wonder how often people join on
ltree, for example. Did you just notice ltree can't hash and decided to
fix that, or do you have a practical use case / need for this feature?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




[PATCH] ltree hash functions

2023-06-17 Thread Tommy Pavlicek
Hi All,

I've written a patch to add hash functions for the ltree extension. It adds
support for hash indexes and hash aggregation. I've reused the existing
logic that's used to hash arrays and added tests that mirror elsewhere
(i.e. hstore and hash_func regression tests).

The patch doesn't currently support hash joins as the ltree = operator was
created without support for it. The ALTER OPERATOR command doesn't support
changing the hash join support, so I'm not sure what the best strategy to
change it is. Is it ok to update the operator's row in the pg_operator
system catalog or is there a better way to change this that someone could
recommend?

Any comments on the overall approach or other feedback would be appreciated.

Thanks,
Tommy


ltree-hash-v1.patch
Description: Binary data