Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear David

There are two approaches for storing temporal information in a relational
database, explored since the 1980s following the work of Richard Snodgrass
http://www2.cs.arizona.edu/~rts/publications.html
tuple-timestamping vs attribute-timestamping.  The SQL standard used the
tuple-timestamping approach, but in MobilityDB we decided to use the
attribute-timestamping approach. As you rightly pointed out,
tuple-timestamping follows the traditional relational normalization theory.

The main advantage of the attribute timestamping for mobility data is that
we need only to store the changes of values for a temporal attribute. In
the example of gear for a car, even if we receive high-frequency
observations, there will be very little gear changes for a trip, while
there will be much more position changes. Therefore on MobilityDB we only
store the change of values  (e.g., no change of position will be stored
during a red light or traffic jam), which constitutes a huge lossless
compression with respect to the raw format storing every observation in a
single row. We have experimented 450% lossless compression for real IoT
data.

In addition, MobilityDB does all temporal operations and allows to
determine the value of any temporal attribute at any timestamp (e.g., using
linear interpolation between observations for speed or GPS position),
independently of the actual stored observations.

I hope this clarifies things a little.


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Julien Rouhaud
On Mon, Feb 07, 2022 at 10:10:53AM -0700, David G. Johnston wrote:
> On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi 
> wrote:
> 
> >
> > As suggested by David, this goes beyond the "traditional" usage of
> > PostgreSQL. Therefore my questions are
> > * What is the suggested strategy to splitting these 2K attributes into
> > vertically partitioned tables where the tables are linked by the primary
> > key (e.g. trip number in the example above). Are there any limitations/best
> > practices in the number/size of TOASTED attributes that a table should
> > contain.
> > * In each partitioned table containing N TOASTED attributes, given the
> > above requirements, are there any limitations/best practices in storing
> > them using extended storage or an alternative one such as external.
> >
> >
> Frankly, the best practice is "don't have that many columns".  Since you
> do, I posit that you are just going to have to make decisions (possibly
> with experimentation) on your own.  Or maybe ask around on a MobilityDB
> forum what people using that tool and having these kinds of data structures
> do.  From a core PostgreSQL perspective you've already deviated from the
> model structures that it was designed with in mind.
> I'm really confused that you'd want the data value itself to contain a
> timestamp that, on a per-row basis, should be the same timestamp that every
> other value on the row has.  Extracting the timestamp to it own column and
> using simpler and atomic data types is how core PostgreSQL and the
> relational model normalization recommend dealing with this situation.  Then
> you just break up the attributes of a similar nature into their own tables
> based upon their shared nature.  In almost all cases relying on "main"
> storage.

Actually looking at the original example:

>CREATE TYPE tint (
>  internallength = variable,
>  [...]
>  storage = extended,
>  alignment = double,
>  [...]
>);

I'm wondering if it's just some miscommunication here.  If the tint data type
only needs to hold a timestamp and an int, I don't see why it would be
varlerna at all.

So if a single tint can internally hold thousands of (int, timestamptz), a bit
like pgpointcloud, then having it by default external (so both possibly
out-of-line and compressed) seems like a good idea, as you can definitely hit
the 8k boundary, it should compress nicely and you also avoid some quite high
tuple header overhead.




Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread David G. Johnston
On Mon, Feb 7, 2022 at 9:58 AM Esteban Zimanyi 
wrote:

>
> As suggested by David, this goes beyond the "traditional" usage of
> PostgreSQL. Therefore my questions are
> * What is the suggested strategy to splitting these 2K attributes into
> vertically partitioned tables where the tables are linked by the primary
> key (e.g. trip number in the example above). Are there any limitations/best
> practices in the number/size of TOASTED attributes that a table should
> contain.
> * In each partitioned table containing N TOASTED attributes, given the
> above requirements, are there any limitations/best practices in storing
> them using extended storage or an alternative one such as external.
>
>
Frankly, the best practice is "don't have that many columns".  Since you
do, I posit that you are just going to have to make decisions (possibly
with experimentation) on your own.  Or maybe ask around on a MobilityDB
forum what people using that tool and having these kinds of data structures
do.  From a core PostgreSQL perspective you've already deviated from the
model structures that it was designed with in mind.
I'm really confused that you'd want the data value itself to contain a
timestamp that, on a per-row basis, should be the same timestamp that every
other value on the row has.  Extracting the timestamp to it own column and
using simpler and atomic data types is how core PostgreSQL and the
relational model normalization recommend dealing with this situation.  Then
you just break up the attributes of a similar nature into their own tables
based upon their shared nature.  In almost all cases relying on "main"
storage.

David J.


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Many thanks for your prompt reply David. Allow me then to restate the
questions, hoping that this better fits this mailing list.

MobilityDB is a time-series extension to PostgreSQL/PostGIS in which
time-varying attributes (e.g., gear, GPS location of a car) are
semantically grouped into "units" (e.g., a trip of a car) and are stored as
temporal functions, e.g., a set of couples (integer, timestamptz) for gear
(a temporal integer) or a set of triples (lon, lat, timestamptz) for the
GPS location (a temporal point). All temporal types are stored using
extended format, e.g.,
CREATE TYPE tint (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);
When ingesting mobility (IoT) data into MobilityDB we receive very wide (2K
attributes) of high frequency (every tenth of a second) from flat format
(e.g. CSV) and we need to store it in PostgreSQL tables using MobilityDB
temporal types. In the above scenario, the values of these temporal types
can be very wide (on average 30K timestamped couples/triples per trip).

As suggested by David, this goes beyond the "traditional" usage of
PostgreSQL. Therefore my questions are
* What is the suggested strategy to splitting these 2K attributes into
vertically partitioned tables where the tables are linked by the primary
key (e.g. trip number in the example above). Are there any limitations/best
practices in the number/size of TOASTED attributes that a table should
contain.
* In each partitioned table containing N TOASTED attributes, given the
above requirements, are there any limitations/best practices in storing
them using extended storage or an alternative one such as external.

Many thanks for your insight

Esteban


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread David G. Johnston
On Mon, Feb 7, 2022 at 8:44 AM Esteban Zimanyi 
wrote:

> May I kindly ask your insight about a question I posted 1 month ago and
> for which I never received any answer ?
>

-hackers really isn't the correct place for usage questions like this -
even if you are creating a custom type (why you are doing this is left
unstated, I have no idea what it means to be a temporally extended version
of boolean, etc...)

You should read up on how TOAST works in PostgreSQL since that is what
handles large length data values.

IIUC your setup correctly, you are claiming to have 2k or more columns.
This well exceeds the limit for PostgreSQL.
A "tablespace" is a particular functionality provided by the server.   You
are using "table space" in a different sense and I'm unsure exactly what
you mean.  I presume "cell".  PostgreSQL has row-oriented storage (our
internals documentation goes over this).

I think your mention of mobilitydb also complicates receiving a useful
response as this list is for the core project.  That you can exceed the
column count limit suggests that your environment is enough different than
core that you should be asking there.

You will need to await someone else to specifically answer the extended
storage question though - but I suspect you've provided insufficient
details in that regard.

David J.


Re: Storage for multiple variable-length attributes in a single row

2022-02-07 Thread Esteban Zimanyi
Dear all

May I kindly ask your insight about a question I posted 1 month ago and for
which I never received any answer ?

Many thanks

On Thu, Jan 6, 2022 at 4:05 PM Esteban Zimanyi 
wrote:

> Dear all
>
> When ingesting mobility (IoT) data into MobilityDB
> https://mobilitydb.com/
> we transform very wide (2K attributes) car mobility data of high frequence
> (every tenth of a second) from flat format (e.g. CSV) into MobilityDB
> format in which there is a single record per trip and each of the signals
> is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
> tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
> PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
> geography). All temporal types are stored using extended format, e.g.,
> CREATE TYPE tfloat (
>   internallength = variable,
>   [...]
>   storage = extended,
>   alignment = double,
>   [...]
> );
>
> Given that each temporal value can be very wide (on average 30K
> timestamped  points/floats/text/... per trip) our first question is
> * Is extended the right storage for this ?
>
> Our second question is how all the 2K temporal attributes are stored,
> which may be
> * on a single table space
> * in one table space per attribute
> which in other words, relates to the question row vs column storage.
>
> Many thanks for your insight
>
> Esteban
>
>


Storage for multiple variable-length attributes in a single row

2022-01-06 Thread Esteban Zimanyi
Dear all

When ingesting mobility (IoT) data into MobilityDB
https://mobilitydb.com/
we transform very wide (2K attributes) car mobility data of high frequence
(every tenth of a second) from flat format (e.g. CSV) into MobilityDB
format in which there is a single record per trip and each of the signals
is transformed into a temporal attribute (tbool, tint, tfloat, ttext,
tgeompoint, tgeogpoint), which are temporal extensions of the corresponding
PostgreSQL/PostGIS base types (bool, int, float, text, geometry,
geography). All temporal types are stored using extended format, e.g.,
CREATE TYPE tfloat (
  internallength = variable,
  [...]
  storage = extended,
  alignment = double,
  [...]
);

Given that each temporal value can be very wide (on average 30K
timestamped  points/floats/text/... per trip) our first question is
* Is extended the right storage for this ?

Our second question is how all the 2K temporal attributes are stored, which
may be
* on a single table space
* in one table space per attribute
which in other words, relates to the question row vs column storage.

Many thanks for your insight

Esteban