Re: PG Schema to be used as log and monitoring store
Hi, 2017-12-10 2:53 GMT+01:00 John R Pierce : > I believe I would use boolean, not bit. Agreed. 2017-12-10 10:01 GMT+01:00 Thomas Kellerer : > Did you try to use a (single) hstore or jsonb column instead where the > attribute name is the key? Thought about that and I'm using hstore extensively with OpenStreetMap data in my PostGIS Terminal. But it "hurts" really with key-values of type text (with numeric optimization)... 1. to see boolean values stored in text values 2. to see tables m1 and m2 which have a relatively unchanging schema (since sensors are HW) 3. and knowing that and m1 and m2 have a large common set of sensors (i.e. common set of attributes). :Stefan 2017-12-10 10:01 GMT+01:00 Thomas Kellerer : >> I actually made some tests on my own (using generate_series) and did >> not find any disk space or performance issues yet. >> I've also found this paper from 2012 about "Sensor Data Storage >> Performance: SQL or NoSQL, Physical or Virtual" [2] which confirms my >> observations. >> >> Now, you have to know that there are about 100 attributes for the >> machines/tables - not only 40 - so I initially thought, it's easier to >> setup the schema using bit(50) and float8[50]. > > > Did you try to use a (single) hstore or jsonb column instead where the > attribute name is the key? > > You'd lose some type safety, but those data types can be compressed, so that > might be worth the trade off > > Thomas > > >
Re: PG Schema to be used as log and monitoring store
I actually made some tests on my own (using generate_series) and did not find any disk space or performance issues yet. I've also found this paper from 2012 about "Sensor Data Storage Performance: SQL or NoSQL, Physical or Virtual" [2] which confirms my observations. Now, you have to know that there are about 100 attributes for the machines/tables - not only 40 - so I initially thought, it's easier to setup the schema using bit(50) and float8[50]. Did you try to use a (single) hstore or jsonb column instead where the attribute name is the key? You'd lose some type safety, but those data types can be compressed, so that might be worth the trade off Thomas
Re: PG Schema to be used as log and monitoring store
On 12/9/2017 5:46 PM, Stefan Keller wrote: Below I re-modeled it to a relational schema as you suggested and also tried to utilize the INHERITS feature. Does that look better? I believe I would use boolean, not bit. -- john r pierce, recycling bits in santa cruz
Re: PG Schema to be used as log and monitoring store
Hi Thanks James and Steven! I hoped somebody will advise me not to do this. I was just bothered with NoSQL databases. Even TimescaleDB made me wonder because it says it scales Postgres [1] for IoT which implies that Postsgres does not scale... 2017-12-09 23:01 GMT+01:00 Steven Lembark : > In general this is a bad idea *unless* you have benchmarked the > database and found that the amount of space saved really does make > some difference. I actually made some tests on my own (using generate_series) and did not find any disk space or performance issues yet. I've also found this paper from 2012 about "Sensor Data Storage Performance: SQL or NoSQL, Physical or Virtual" [2] which confirms my observations. Now, you have to know that there are about 100 attributes for the machines/tables - not only 40 - so I initially thought, it's easier to setup the schema using bit(50) and float8[50]. Below I re-modeled it to a relational schema as you suggested and also tried to utilize the INHERITS feature. Does that look better? :Stefan [1] https://blog.timescale.com/choose-postgresql-for-iot-19688efc60ca [2] https://www.ceid.upatras.gr/webpages/faculty/vasilis/Courses/SpatialTemporalDM/Papers/SQLorNoSQL2012.pdf /* Pure relational logging and monitoring schema */ create table m_meta ( id int primary key, name text ); drop table if exists m cascade; create table m ( id bigint primary key, gid int references m_meta not null, created timestamp, b1 bit, b2 bit, b3 bit, -- b2 .. b20 f1 float8, f2 float8, f3 float8 --f4 ... f20 ); create table m1 ( b21 bit, -- b22 .. b50 bit, f21 float8, --f4 ... f20 float8, primary key (id), foreign key (gid) references m_meta ) inherits (m); --create table m1 ( ... ) inherits (m); /* end */ 2017-12-09 23:01 GMT+01:00 Steven Lembark : > On Sat, 9 Dec 2017 20:22:02 +0100 > Stefan Keller wrote: > >> create table m1 ( >> id bigint, >> created timestamp, >> b20 bit(20) default b'', >> farr20 float8[20] >> ); > > In general this is a bad idea *unless* you have benchmarked the > database and found that the amount of space saved really does make > some difference. Using the packed format makes most SQL a lot harder > to write and makes indexing impossible (or at least messy and rather > error prone). This also makes adding add'l fields harder. > > If you were really intent on doing this I'd add a few million recods > with both formats on a database tuned to handle the load and see if > the packed bits really do make a difference. My guess is that you > won't see all that much difference in storage and the query speed > with effective indexing is going to be decent. > > Using this database might be a lot simpler with a few that > breaks the sub-fields out, or which has indexes on the sub > -fields within the packed data. > > > -- > Steven Lembark 1505 National Ave > Workhorse Computing Rockford, IL 61103 > lemb...@wrkhors.com+1 888 359 3508 >
Re: PG Schema to be used as log and monitoring store
On Sat, 9 Dec 2017 20:22:02 +0100 Stefan Keller wrote: > create table m1 ( > id bigint, > created timestamp, > b20 bit(20) default b'', > farr20 float8[20] > ); In general this is a bad idea *unless* you have benchmarked the database and found that the amount of space saved really does make some difference. Using the packed format makes most SQL a lot harder to write and makes indexing impossible (or at least messy and rather error prone). This also makes adding add'l fields harder. If you were really intent on doing this I'd add a few million recods with both formats on a database tuned to handle the load and see if the packed bits really do make a difference. My guess is that you won't see all that much difference in storage and the query speed with effective indexing is going to be decent. Using this database might be a lot simpler with a few that breaks the sub-fields out, or which has indexes on the sub -fields within the packed data. -- Steven Lembark 1505 National Ave Workhorse Computing Rockford, IL 61103 lemb...@wrkhors.com+1 888 359 3508
Re: PG Schema to be used as log and monitoring store
My initial inclination is to always build the simplest to understand system first. Space is cheap and pg is pretty efficient, engineering time is expensive and debugging time doubly so with a side of anxiety when production goes down. Also, it will allow more flexibility later on to describe your current setup semantically rathe than in a condensed form. I would suggest building a simpler schema and benchmarking insert times and storage space. If you go with your condensed form, I would suggest writing functions to extract any data that is more complicated than one expression involving one field. Just my 2ยข. Best wishes, Jim On December 9, 2017 2:22:02 PM EST, Stefan Keller wrote: >Hi, > >Given this kind of sensors (Internet-of-Things) log and monitoring >scenario: > >* There are 3 production machines monitored every few seconds for >forthcoming (~2) years. >* Machine m1 is emitting 20 boolean and 20 float4 captured in sensors >(m1s1..m1s40). >* Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20). >* Machine m3: like m2 but half of the attributes are different. >* Queries are happening once every day, like: >SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND >'2017-11-30'". > >So this is a kind of an "Immutable DB" with where there are >* rather static schema with sources which have overlapping attributes >* heavy writes, >* periodic reads > >Would you model this schema also like my proposition, which saves >place but makes it little bit more complex to insert/update due to the >arrays? > > create table m1 ( >id bigint, >created timestamp, >b20 bit(20) default b'', >farr20 float8[20] > ); > >:Stefan -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
PG Schema to be used as log and monitoring store
Hi, Given this kind of sensors (Internet-of-Things) log and monitoring scenario: * There are 3 production machines monitored every few seconds for forthcoming (~2) years. * Machine m1 is emitting 20 boolean and 20 float4 captured in sensors (m1s1..m1s40). * Machine m2 has same attributes as m1 plus 10+10 more (m2s1..m2s20). * Machine m3: like m2 but half of the attributes are different. * Queries are happening once every day, like: SELECT m1s1,m1s2 FROM m1 WHERE logged BETWEEN '2017-11-01' AND '2017-11-30'". So this is a kind of an "Immutable DB" with where there are * rather static schema with sources which have overlapping attributes * heavy writes, * periodic reads Would you model this schema also like my proposition, which saves place but makes it little bit more complex to insert/update due to the arrays? create table m1 ( id bigint, created timestamp, b20 bit(20) default b'', farr20 float8[20] ); :Stefan