Re: Proposal: Adjacent B-Tree index

2024-02-20 Thread Dilshod Urazov
> only 1 index lookup is needed.
Sorry, must be "only lookups of 1 index are needed".

--
Dilshod Urazov

вт, 20 февр. 2024 г. в 21:09, Dilshod Urazov :

> > I'm not sure why are two indexes not sufficient here?
>
> Did I write that they are not sufficient? The whole point is that in
> relational DBMSs which are widely used
> to store graphs we can optimize storage in such cases. Also we can
> optimize traversals e.g. if we want to
> get all nodes that are adjacent to a given node with id = X in an oriented
> graph
>
> SELECT id, label
> FROM Nodes
> JOIN Edges ON Nodes.id = Edges.target
> WHERE Edges.source = X;
>
> only 1 index lookup is needed.
>
> > The entry could've been removed because (e.g.)
> > test's b column was updated thus inserting a new index entry for the
> > new HOT-chain's TID.
>
> If test'b column was updated and HOT optimization took place no new index
> entry is created. Index tuple
> pointing to old heap tuple is valid since now it is pointing to HOT-chain.
>
> --
> Dilshod Urazov
>
> пн, 19 февр. 2024 г. в 22:32, Matthias van de Meent <
> boekewurm+postg...@gmail.com>:
>
>> On Mon, 19 Feb 2024 at 18:48, Dilshod Urazov 
>> wrote:
>> >
>> > - Motivation
>> >
>> > A regular B-tree index provides efficient mapping of key values to
>> tuples within a table. However, if you have two tables connected in some
>> way, a regular B-tree index may not be efficient enough. In this case, you
>> would need to create an index for each table. The purpose will become
>> clearer if we consider a simple example which is the main use-case as I see
>> it.
>>
>> I'm not sure why are two indexes not sufficient here? PostgreSQL can
>> already do merge joins, which would have the same effect of hitting
>> the same location in the index at the same time between all tables,
>> without the additional overhead of having to scan two table's worth of
>> indexes in VACUUM.
>>
>> > During the vacuum of A an index tuple pointing to a dead tuple in A
>> should be cleaned as well as all index tuples for the same key.
>>
>> This is definitely not correct. If I have this schema
>>
>> table test (id int primary key, b text unique)
>> table test_ref(test_id int references test(id))
>>
>> and if an index would contain entries for both test and test_ref, it
>> can't just remove all test_ref entries because an index entry with the
>> same id was removed: The entry could've been removed because (e.g.)
>> test's b column was updated thus inserting a new index entry for the
>> new HOT-chain's TID.
>>
>> > would suffice for this new semantics.
>>
>> With the provided explanation I don't think this is a great idea.
>>
>> Kind regards,
>>
>> Matthias van de Meent.
>>
>


Re: Proposal: Adjacent B-Tree index

2024-02-20 Thread Dilshod Urazov
> I'm not sure why are two indexes not sufficient here?

Did I write that they are not sufficient? The whole point is that in
relational DBMSs which are widely used
to store graphs we can optimize storage in such cases. Also we can optimize
traversals e.g. if we want to
get all nodes that are adjacent to a given node with id = X in an oriented
graph

SELECT id, label
FROM Nodes
JOIN Edges ON Nodes.id = Edges.target
WHERE Edges.source = X;

only 1 index lookup is needed.

> The entry could've been removed because (e.g.)
> test's b column was updated thus inserting a new index entry for the
> new HOT-chain's TID.

If test'b column was updated and HOT optimization took place no new index
entry is created. Index tuple
pointing to old heap tuple is valid since now it is pointing to HOT-chain.

--
Dilshod Urazov

пн, 19 февр. 2024 г. в 22:32, Matthias van de Meent <
boekewurm+postg...@gmail.com>:

> On Mon, 19 Feb 2024 at 18:48, Dilshod Urazov 
> wrote:
> >
> > - Motivation
> >
> > A regular B-tree index provides efficient mapping of key values to
> tuples within a table. However, if you have two tables connected in some
> way, a regular B-tree index may not be efficient enough. In this case, you
> would need to create an index for each table. The purpose will become
> clearer if we consider a simple example which is the main use-case as I see
> it.
>
> I'm not sure why are two indexes not sufficient here? PostgreSQL can
> already do merge joins, which would have the same effect of hitting
> the same location in the index at the same time between all tables,
> without the additional overhead of having to scan two table's worth of
> indexes in VACUUM.
>
> > During the vacuum of A an index tuple pointing to a dead tuple in A
> should be cleaned as well as all index tuples for the same key.
>
> This is definitely not correct. If I have this schema
>
> table test (id int primary key, b text unique)
> table test_ref(test_id int references test(id))
>
> and if an index would contain entries for both test and test_ref, it
> can't just remove all test_ref entries because an index entry with the
> same id was removed: The entry could've been removed because (e.g.)
> test's b column was updated thus inserting a new index entry for the
> new HOT-chain's TID.
>
> > would suffice for this new semantics.
>
> With the provided explanation I don't think this is a great idea.
>
> Kind regards,
>
> Matthias van de Meent.
>


Re: Proposal: Adjacent B-Tree index

2024-02-19 Thread Matthias van de Meent
On Mon, 19 Feb 2024 at 18:48, Dilshod Urazov  wrote:
>
> - Motivation
>
> A regular B-tree index provides efficient mapping of key values to tuples 
> within a table. However, if you have two tables connected in some way, a 
> regular B-tree index may not be efficient enough. In this case, you would 
> need to create an index for each table. The purpose will become clearer if we 
> consider a simple example which is the main use-case as I see it.

I'm not sure why are two indexes not sufficient here? PostgreSQL can
already do merge joins, which would have the same effect of hitting
the same location in the index at the same time between all tables,
without the additional overhead of having to scan two table's worth of
indexes in VACUUM.

> During the vacuum of A an index tuple pointing to a dead tuple in A should be 
> cleaned as well as all index tuples for the same key.

This is definitely not correct. If I have this schema

table test (id int primary key, b text unique)
table test_ref(test_id int references test(id))

and if an index would contain entries for both test and test_ref, it
can't just remove all test_ref entries because an index entry with the
same id was removed: The entry could've been removed because (e.g.)
test's b column was updated thus inserting a new index entry for the
new HOT-chain's TID.

> would suffice for this new semantics.

With the provided explanation I don't think this is a great idea.

Kind regards,

Matthias van de Meent.




Proposal: Adjacent B-Tree index

2024-02-19 Thread Dilshod Urazov
- Motivation

A regular B-tree index provides efficient mapping of key values to tuples
within a table. However, if you have two tables connected in some way, a
regular B-tree index may not be efficient enough. In this case, you would
need to create an index for each table. The purpose will become clearer if
we consider a simple example which is the main use-case as I see it.

- Example

We need to store a graph. So we create a table for nodes

CREATE TABLE Nodes (
  id SERIAL PRIMARY KEY,
  label VARCHAR(255)
);

and a table for edges

CREATE TABLE Edges (
  label VARCHAR(255),
  source INTEGER REFERENCES Nodes(id),
  target INTEGER REFERENCES Nodes(id)
);

In order to efficiently traverse a graph we would have an index for
Nodes.id which created automatically in this case and an index for
Edges.source.

- Tweaked B-Tree

We could optimize cases like the former by modifying PostgreSQL btree index
to allow it to index 2 tables simultaneously.

Semantically it would be UNIQUE index for attribute x of table A and an
index for attribute y in table B. In the non-deduplicated case  index tuple
pointing to a tuple in A should be marked by a flag. In the deduplicated
case first TID in an array can be interpreted as a link to A.
During the vacuum of A an index tuple pointing to a dead tuple in A should
be cleaned as well as all index tuples for the same key. We can reach this
by clearing all index tuples after the dead one until we come to index
tuple marked by a flag with different key. Or we can enforce deduplication
in such index.

In the example with a graph such index would provide PRIMARY KEY for Nodes
and the index for Edges.Source. The query

SELECT * FROM Nodes WHERE id = X;

will use this index and take into account only a TID in Nodes (so this
would be marked index tuple or first TID in a posting list).  The query

SELECT * FROM Edges WHERE source = X;

conversely will ignore links to Nodes.

-- Syntax

I believe that
CREATE TABLE Nodes (
  id SERIAL PRIMARY KEY ADJACENT,
  label VARCHAR(255)
);
CREATE TABLE Edges (
  label VARCHAR(255),
  source INTEGER REFERENCES ADJACENT Nodes(id),
  target INTEGER REFERENCES Nodes(id)
);

would suffice for this new semantics.
--
Dilshod Urazov