On Mon, Nov 30, 2020 at 11:39 PM Alexander Korotkov
<[email protected]> wrote:
> On Mon, Nov 30, 2020 at 10:35 PM Alexander Korotkov
> <[email protected]> wrote:
> > On Sun, Nov 29, 2020 at 11:53 PM Paul A Jungwirth
> > <[email protected]> wrote:
> > >
> > > On Sun, Nov 29, 2020 at 11:43 AM Alexander Korotkov
> > > <[email protected]> wrote:
> > > > Thank you. Could you please, update doc/src/sgml/catalogs.sgml,
> > > > because pg_type and pg_range catalogs are updated.
> > >
> > > Attached! :-)
> >
> > You're quick, thank you. Please, also take a look at cfbot failure
> > https://travis-ci.org/github/postgresql-cfbot/postgresql/builds/746623942
> > I've tried to reproduce it, but didn't manage yet.
>
> Got it. type_sanity test fails on any platform, you just need to
> repeat "make check" till it fails.
>
> The failed query checked consistency of range types, but it didn't
> take into account ranges of domains and ranges of records, which are
> exercised by multirangetypes test running in parallel. We could teach
> this query about such kinds of ranges, but I think that would be
> overkill, because we're not going to introduce such builtin ranges
> yet. So, I'm going to just move multirangetypes test into another
> group of parallel tests.
I also found a problem in multirange types naming logic. Consider the
following example.
create type a_multirange AS (x float, y float);
create type a as range(subtype=text, collation="C");
create table tbl (x __a_multirange);
drop type a_multirange;
If you dump this database, the dump couldn't be restored. The
multirange type is named __a_multirange, because the type named
a_multirange already exists. However, it might appear that
a_multirange type is already deleted. When the dump is restored, a
multirange type is named a_multirange, and the corresponding table
fails to be created. The same thing doesn't happen with arrays,
because arrays are not referenced in dumps by their internal names.
I think we probably should add an option to specify multirange type
names while creating a range type. Then dump can contain exact type
names used in the database, and restore wouldn't have a names
collision.
Another thing that worries me is the multirange serialization format.
typedef struct
{
int32 vl_len_; /* varlena header */
char flags; /* range flags */
char _padding; /* Bounds must be aligned */
/* Following the header are zero to two bound values. */
} ShortRangeType;
Comment says this structure doesn't contain a varlena header, while
structure obviously has it.
In general, I wonder if we can make the binary format of multiranges
more efficient. It seems that every function involving multiranges
from multirange_deserialize(). I think we can make functions like
multirange_contains_elem() much more efficient. Multirange is
basically an array of ranges. So we can pack it as follows.
1. Typeid and rangecount
2. Tightly packed array of flags (1-byte for each range)
3. Array of indexes of boundaries (4-byte for each range). Or even
better we can combine offsets and lengths to be compression-friendly
like jsonb JEntry's do.
4. Boundary values
Using this format, we can implement multirange_contains_elem(),
multirange_contains_range() without deserialization and using binary
search. That would be much more efficient. What do you think?
------
Regards,
Alexander Korotkov