Hi,

in fact it is mostly in the MDF spec not for compression (that’s a nice side 
effect) but rather for being able to really express the (physical) content of a 
signal.
So my initial idea was to implement it as an optional layer on top of the 
current tsfile which does the "interpretation". Because in the tsfile its 
always just a "primitive" series that is stored.

So the idea would be to store some metadata (like a formula, lookup table, ...) 
on creation and use that on reading but only optionally.
You can look at how avro handles non primitive types (they call it 
LogicalTypes) here: https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types
This is similar to my idea.

Julian

Am 29.10.19, 14:40 schrieb "Xiangdong Huang" <saint...@gmail.com>:

    Hi,
    
    > Then its most efficient to store integers and a formula like a * x + b
    with e.g. b = 3 and a = 1/100.
    > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
    > So we only store 0 to 1200 and no decimals and stuff which would be very
    easily compressable I thnk.
    
    Good idea! Two thumbs up for that.
    
    But for cases like the above, implementing a new encoding method is better
    than a new data type.
    
    e.g, create time series root.a.b.voltage with encoding =
    linear_transformation and encoding_parameter = "describe the function like
    y=a * x + b" and datatype = INT.
    
    "linear_transformation" is the new encoding method.
    
    Now I get two cases from the discussion, one is like Optional data, and the
    other is data that can be transformative.
    So, do we want to support the above two, or find a more general data type
    for "rich data type" (can the MDF file support some inspiration)?
    
    Best,
    -----------------------------------
    Xiangdong Huang
    School of Software, Tsinghua University
    
     黄向东
    清华大学 软件学院
    
    
    Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年10月29日周二 下午8:26写道:
    
    > Hi Xiangdong,
    >
    > to your second question:
    > The use case ist he other way round.
    > We know that we measure e.g. a voltage between 3V and 4.2V with a
    > precision of 0.01 or something.
    > Then its most efficient to store integers and a formula like a * x + b
    > with e.g. b = 3 and a = 1/100.
    > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 1200.
    > So we only store 0 to 1200 and no decimals and stuff which would be very
    > easily compressable I thnk.
    >
    > Julian
    >
    > Am 29.10.19, 07:13 schrieb "Xiangdong Huang" <saint...@gmail.com>:
    >
    >     Hi,
    >
    >     > In Java we could model it as a variable Optional<> x which could be
    > null,
    >     Optional.empty(), Optional.of(true), Optional.of(false).
    >
    >     It make sense.  And, using a new data type to achieve in IoTDB it is
    > ok.
    >
    >     > Or scale formulas like a*x+b which allows to leverage the precision
    > even
    >     for “small” double values or even integers.
    >
    >     So, are you considering a use case like: the time series value should
    > be
    >     [1, 1, 0, 0, 1, 1, 1, 0, 0...]  but actually we get [0.99, 0.99, 0.01,
    > 0,
    >     1, 1, 0.999, 0, 0.01] (because of the precision of sensors)?
    >     And, what values do you want to save?
    >     (1)save them as 1 and 0.  Or,
    >     (2)  save them as 0.99, 0.01 indeed, but using a specific query API to
    >     return data like 1 and 0?
    >
    >     My another question is, is there a general data type can support the
    > above
    >     cases?
    >
    >     Best,
    >     -----------------------------------
    >     Xiangdong Huang
    >     School of Software, Tsinghua University
    >
    >      黄向东
    >     清华大学 软件学院
    >
    >
    >     Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年10月29日周二
    > 上午3:58写道:
    >
    >     > Hi all,
    >     >
    >     > I wanted to discuss a possible new feature I will call Rich 
Datatypes
    >     > (RDT) API in the following.
    >     > I worked a lot in the automotive industry and there is a broadly
    > adopted
    >     > open Standard called ASAM MDF (
    > https://www.asam.net/standards/detail/mdf/
    >     > ).
    >     > It is a format which is targeted at the efficient storage but at the
    > same
    >     > time it supports VERY complex types (which are often used in
    > automotive
    >     > controllers).
    >     >
    >     > Take something as simple as a boolean. We could store it as a
    > boolean (as
    >     > java bool) in 1 bit.
    >     > BUT we have overall 4 possibilities:
    >     >
    >     >   *   No value is available for a timestamp (NULL / nothing stored)
    >     >   *   We had a successful request but the Controller does not know
    > whether
    >     > true or false (or had an internal error), this is a bit like
    >     > Optional.isPresent() == false
    >     >   *   True
    >     >   *   False
    >     > In Java we could model it as a variable Optional<> x which could be
    > null,
    >     > Optional.empty(), Optional.of(true), Optional.of(false).
    >     >
    >     > Other examples are discrete values like “ON”, “OFF” (which are
    > handled as
    >     > “lookup tables” on integer rows, internally).
    >     > Or scale formulas like a*x+b which allows to leverage the precision
    > even
    >     > for “small” double values or even integers.
    >     > A formula but also a “fallback” lookup value like “NV”.
    >     >
    >     > I think this could be a valuable extension to IoTDB as an additional
    > API
    >     > (not change anything below but just provide an API on top to do the
    >     > calculation).
    >     >
    >     > What do others think?
    >     >
    >     > Julian
    >     >
    >
    >
    >
    

Reply via email to