Re: PG Schema to be used as log and monitoring store

2017-12-10 Thread Stefan Keller
Hi,

2017-12-10 2:53 GMT+01:00 John R Pierce :
> I believe I would use boolean, not bit.

Agreed.

2017-12-10 10:01 GMT+01:00 Thomas Kellerer :
> Did you try to use a (single) hstore or jsonb column instead where the
> attribute name is the key?

Thought about that and I'm using hstore extensively with OpenStreetMap
data in my PostGIS Terminal.

But it "hurts" really with key-values of type text (with numeric
optimization)...

1. to see boolean values stored in text values
2. to see tables m1 and m2 which have a relatively unchanging schema
(since sensors are HW)
3. and knowing that and m1 and m2 have a large common set of sensors
(i.e. common set of attributes).

:Stefan


2017-12-10 10:01 GMT+01:00 Thomas Kellerer :
>> I actually made some tests on my own (using generate_series) and did
>> not find any disk space or performance issues yet.
>> I've also found this paper from 2012 about "Sensor Data Storage
>> Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
>> observations.
>>
>> Now, you have to know that there are about 100 attributes for the
>> machines/tables - not only 40 - so I initially thought, it's easier to
>> setup the schema using bit(50) and float8[50].
>
>
> Did you try to use a (single) hstore or jsonb column instead where the
> attribute name is the key?
>
> You'd lose some type safety, but those data types can be compressed, so that
> might be worth the trade off
>
> Thomas
>
>
>



Re: PG Schema to be used as log and monitoring store

2017-12-10 Thread Thomas Kellerer

I actually made some tests on my own (using generate_series) and did
not find any disk space or performance issues yet.
I've also found this paper from 2012 about "Sensor Data Storage
Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
observations.

Now, you have to know that there are about 100 attributes for the
machines/tables - not only 40 - so I initially thought, it's easier to
setup the schema using bit(50) and float8[50].


Did you try to use a (single) hstore or jsonb column instead where the 
attribute name is the key?

You'd lose some type safety, but those data types can be compressed, so that 
might be worth the trade off

Thomas





Re: PG Schema to be used as log and monitoring store

2017-12-09 Thread John R Pierce

On 12/9/2017 5:46 PM, Stefan Keller wrote:

Below I re-modeled it to a relational schema as you suggested and also
tried to utilize the INHERITS feature.
Does that look better?



I believe I would use boolean, not bit.


--
john r pierce, recycling bits in santa cruz




Re: PG Schema to be used as log and monitoring store

2017-12-09 Thread Stefan Keller
Hi

Thanks James and Steven!
I hoped somebody will advise me not to do this.
I was just bothered with NoSQL databases.
Even TimescaleDB made me wonder because it says it scales Postgres [1]
for IoT which implies that Postsgres does not scale...

2017-12-09 23:01 GMT+01:00 Steven Lembark :
> In general this is a bad idea *unless* you have benchmarked the
> database and found that the amount of space saved really does make
> some difference.

I actually made some tests on my own (using generate_series) and did
not find any disk space or performance issues yet.
I've also found this paper from 2012 about "Sensor Data Storage
Performance: SQL or NoSQL, Physical or Virtual"  [2] which confirms my
observations.

Now, you have to know that there are about 100 attributes for the
machines/tables - not only 40 - so I initially thought, it's easier to
setup the schema using bit(50) and float8[50].

Below I re-modeled it to a relational schema as you suggested and also
tried to utilize the INHERITS feature.
Does that look better?

:Stefan

[1] https://blog.timescale.com/choose-postgresql-for-iot-19688efc60ca
[2] 
https://www.ceid.upatras.gr/webpages/faculty/vasilis/Courses/SpatialTemporalDM/Papers/SQLorNoSQL2012.pdf


/*  Pure relational logging and monitoring schema */

  create table m_meta (
id int primary key,
name text
  );

  drop table if exists m cascade;
  create table m (
id bigint primary key,
gid int references m_meta not null,
created timestamp,
b1 bit,
b2 bit,
b3 bit,
-- b2 .. b20
f1 float8,
f2 float8,
f3 float8
--f4 ... f20
  );

  create table m1 (
b21 bit,
-- b22 .. b50 bit,
f21 float8,
--f4 ... f20 float8,
primary key (id),
foreign key (gid) references m_meta
  ) inherits (m);

  --create table m1 ( ... ) inherits (m);

/* end */


2017-12-09 23:01 GMT+01:00 Steven Lembark :
> On Sat, 9 Dec 2017 20:22:02 +0100
> Stefan Keller  wrote:
>
>>   create table m1 (
>> id bigint,
>> created timestamp,
>> b20 bit(20) default b'',
>> farr20 float8[20]
>>   );
>
> In general this is a bad idea *unless* you have benchmarked the
> database and found that the amount of space saved really does make
> some difference. Using the packed format makes most SQL a lot harder
> to write and makes indexing impossible (or at least messy and rather
> error prone). This also makes adding add'l fields harder.
>
> If you were really intent on doing this I'd add a few million recods
> with both formats on a database tuned to handle the load and see if
> the packed bits really do make a difference. My guess is that you
> won't see all that much difference in storage and the query speed
> with effective indexing is going to be decent.
>
> Using this database might be a lot simpler with a few that
> breaks the sub-fields out, or which has indexes on the sub
> -fields within the packed data.
>
>
> --
> Steven Lembark   1505 National Ave
> Workhorse Computing Rockford, IL 61103
> lemb...@wrkhors.com+1 888 359 3508
>



Re: PG Schema to be used as log and monitoring store

2017-12-09 Thread Steven Lembark
On Sat, 9 Dec 2017 20:22:02 +0100
Stefan Keller  wrote:

>   create table m1 (
> id bigint,
> created timestamp,
> b20 bit(20) default b'',
> farr20 float8[20]
>   );

In general this is a bad idea *unless* you have benchmarked the 
database and found that the amount of space saved really does make
some difference. Using the packed format makes most SQL a lot harder
to write and makes indexing impossible (or at least messy and rather
error prone). This also makes adding add'l fields harder.

If you were really intent on doing this I'd add a few million recods
with both formats on a database tuned to handle the load and see if 
the packed bits really do make a difference. My guess is that you
won't see all that much difference in storage and the query speed
with effective indexing is going to be decent.

Using this database might be a lot simpler with a few that 
breaks the sub-fields out, or which has indexes on the sub
-fields within the packed data. 


-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: PG Schema to be used as log and monitoring store

2017-12-09 Thread James Keener
My initial inclination is to always build the simplest to understand system 
first. Space is cheap and pg is pretty efficient, engineering time is expensive 
and debugging time doubly so with a side of anxiety when production goes down. 
Also, it will allow more flexibility later on to describe your current setup 
semantically rathe than in a condensed form.

I would suggest building a simpler schema and benchmarking insert times and 
storage space.

If you go with your condensed form, I would suggest writing functions to 
extract any data that is more complicated than one expression involving one 
field.

Just my 2ยข.

Best wishes,

Jim

On December 9, 2017 2:22:02 PM EST, Stefan Keller  wrote:
>Hi,
>
>Given this kind of sensors (Internet-of-Things) log and monitoring
>scenario:
>
>* There are 3 production machines monitored every few seconds for
>forthcoming (~2) years.
>* Machine m1 is emitting 20 boolean and 20 float4 captured in sensors
>(m1s1..m1s40).
>* Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20).
>* Machine m3: like m2 but half of the attributes are different.
>* Queries are happening once every day, like:
>SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND
>'2017-11-30'".
>
>So this is a kind of an "Immutable DB" with where there are
>* rather static schema with sources which have overlapping attributes
>* heavy writes,
>* periodic reads
>
>Would you model this schema also like my proposition, which saves
>place but makes it little bit more complex to insert/update due to the
>arrays?
>
>  create table m1 (
>id bigint,
>created timestamp,
>b20 bit(20) default b'',
>farr20 float8[20]
>  );
>
>:Stefan

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

PG Schema to be used as log and monitoring store

2017-12-09 Thread Stefan Keller
Hi,

Given this kind of sensors (Internet-of-Things) log and monitoring scenario:

* There are 3 production machines monitored every few seconds for
forthcoming (~2) years.
* Machine m1 is emitting 20 boolean and 20 float4 captured in sensors
(m1s1..m1s40).
* Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20).
* Machine m3: like m2 but half of the attributes are different.
* Queries are happening once every day, like:
  SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND '2017-11-30'".

So this is a kind of an "Immutable DB" with where there are
* rather static schema with sources which have overlapping attributes
* heavy writes,
* periodic reads

Would you model this schema also like my proposition, which saves
place but makes it little bit more complex to insert/update due to the
arrays?

  create table m1 (
id bigint,
created timestamp,
b20 bit(20) default b'',
farr20 float8[20]
  );

:Stefan