Re: Operation and robustness of iotDB

Xiangdong Huang Thu, 07 Mar 2019 16:41:08 -0800

Hi,

Yes, every Chunk has its timestamp column. We design like this because
different sensor may have different frequency.. For example, the rotate
speed of an engine may be collected 100 per second, while the GPS info is
collected every 1 second. And, their start time may be not aligned...


So, if you consider the data as a table (timestamp, device name, sensor1,
sensor2,...), then it is a quite sparse table. Parquet introduces Definition
and Repeated Level Fields to read data row by row, but we think it is not
so natural for time series data.

As a result, we store timestamp data on each column (I mean, Chunk).
Experiences show the disk overhead is little. And there is many advantages
for query. (We need to have a merge sort when querying more than one
measurement).

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月8日周五 上午2:08写道：

> Hey Xu Yi,
>
> thanks fort he information.
> I checked the code and indeed I was wrong.
> Every Chunk also stores its timestamp.
>
> So when I read values through a Query all timestamps are "interpolated" or
> merged together from all sensors, or?
>
> Julian
>
> Am 07.03.19, 18:48 schrieb "Xu yi" <xuyith...@126.com>:
>
>     Hi,
>
>     In my opinion, different measurements use their own timestamp even
> though they are grouped into one chunk group.they don’t share from each
> other.
>
>     What do you think of this @xiangdong
>
>     Thanks
>     XuYi
>
>     iPhoneから送信
>
>     2019/03/08 1:41、Julian Feinauer <j.feina...@pragmaticminds.de>のメール:
>
>     > Hi,
>     >
>     > Yes this is what I meant.
>     >
>     > Julian
>     >
>     > Von meinem Mobiltelefon gesendet
>     >
>     >
>     > -------- Ursprüngliche Nachricht --------
>     > Betreff: Re: Operation and robustness of iotDB
>     > Von: 徐毅
>     > An: dev@iotdb.apache.org
>     > Cc:
>     >
>     > Hi,
>     > In the definition of ChunkGroup, what is the meaning of 'share one
> time signal'? Do these measurements share same timestamps?
>     >
>     >
>     > Thanks
>     > XuYi
>     > On 3/8/2019 01:11，Julian Feinauer<j.feina...@pragmaticminds.de>
> wrote：
>     > Hey Xiangdong,
>     > hey all,
>     >
>     > I like the documentation much.
>     > The only thing I'm a bit unsure is about the names (as there is no
> clarification).
>     > So, before I update it with any wrong information I would like to
> ensure that I have the correct understanding.
>     >
>     > I assume that most naming is similar to Parquet.
>     >
>     > Page - Contains one Measurement, smallest source of compression
>     > Chunk - Collection of multiple Pages, still one measurement
>     > ChunkGroup - Collection of chunks of which share one time signal
> (one Chunk for each measurement)
>     >
>     > Is this correct so?
>     >
>     > Julian
>     >
>     > Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <saint...@gmail.com>:
>     >
>     > Hi,
>     >
>     > 1. We have a document to introduce that:
>     > https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
>     >
>     > 2. The new API for recovering data is almost done. I am writing the
> UTs
>     > now. Maybe I can submit a PR tonight (if everything is fine...)
>     >
>     > Best,
>     > -----------------------------------
>     > Xiangdong Huang
>     > School of Software, Tsinghua University
>     >
>     > 黄向东
>     > 清华大学 软件学院
>     >
>     >
>     > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月5日周二
> 下午6:00写道：
>     >
>     > Hi Xiangdong,
>     >
>     > that sounds excellent.
>     > Do you have a short overview of how the file format is designed on
> disk?
>     > I know that its somewhat similar to parquet but I did not find more
>     > details.
>     > Basically what would suffice for us would be something like skipping
> an
>     > invalid column group (or how you name it) and go on with the next,
> or so.
>     >
>     > Julian
>     >
>     > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <saint...@gmail.com>:
>     >
>     > Hi,
>     >
>     > If so, I think I need to add a new API to allow you continue to write
>     > data
>     > in an existing  but not closed correctly TsFile. Then everything is
>     > fine
>     > for you :D
>     >
>     > Best,
>     > -----------------------------------
>     > Xiangdong Huang
>     > School of Software, Tsinghua University
>     >
>     > 黄向东
>     > 清华大学 软件学院
>     >
>     >
>     > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一
> 下午8:08写道：
>     >
>     > Hey Xiangdong,
>     >
>     > thanks for the great explanation.
>     > And in fact, I agree with you that it would be best if we start to
>     > play
>     > around with it and reply all our findings or wishes back to this
>     > list (in
>     > fact that proved to be beneficial in plc4x as well).
>     >
>     > You confirm my thoughts about the two "levels" of APIs (DB and file)
>     > and
>     > the file api is exactly what we looked for for our use case.
>     > As we do not care much about data loss (when an edge device fails
>     > its...
>     > gone).
>     > The crucial point for us is that no corrupt files can be generated.
>     > This means I'm fine when the last data submitted is lost but I'm not
>     > fine
>     > if we can get to a situation where the last datafile is completely
>     > lost
>     > (well, perhaps this could be acceptable).
>     >
>     > @tim: Perhaps its best when you give some more information to
>     > Xiangdong
>     > about our idea, and we can also point to our current code in github
>     >
>     > Julian
>     >
>     > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <saint...@gmail.com>:
>     >
>     > Hi,
>     >
>     > TsFile API is not deprecated. In fact, it is designed for this
>     > scenario and
>     > MapReduce/Spark computing.
>     >
>     > If you just use Reader and Writer API, there is something you
>     > need to
>     > know:
>     >
>     > Let's suppose your block size is x Bytes,
>     > (tsfile-format.properties:
>     > group_size_in_byte).
>     >
>     > 1. If you write data and a shutdown occurs, then all data that is
>     > flushed
>     > on disk is ok, and you can read the data ( class
>     > org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
>     > you need
>     > to
>     > change it a little. I think I can write an example.)
>     >
>     > 2. Actually, TsFile has the ability to allow you continue to
>     > write
>     > data at
>     > the end of the incomplete file. However, We do not provide this
>     > API
>     > now...
>     > If needed, I can add the API.
>     >
>     > 3. In this scenario, you will lose at most x Bytes data. If you
>     > do not
>     > accept that, something like WAL is needed. (It is not very
>     > complex,
>     > but I
>     > am not sure that whether it should be an embedded function for
>     > TsFile).
>     >
>     > Up to now, we can consider that TsFile API is suitable for your
>     > scenario
>     > (even though we need to add a little more API if you desire).
>     > And you
>     > can
>     > get the ability to compress data, and query data from the TsFile
>     > rather
>     > than scan the data from the head to the tail.
>     >
>     > However, TsFile has one constraint: You can not write
>     > out-of-order data
>     > into a TsFile, otherwise the query API may return incomplete
>     > result.
>     > But I think it is ok for real applications, because I do not
>     > think
>     > that a
>     > device can generate out-of-order data....
>     >
>     > For example, If you write two devices' data into one TsFile, it
>     > is ok
>     > if
>     > you write data like:
>     > - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 ....
>     > or:
>     > - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ...
>     >
>     > But you can not write data like:
>     > - d1.m1.t2, d1.m1.t1 ...
>     >
>     > I think it is a good chance to improve TsFile to make it more
>     > suitable
>     > for
>     > real applications, so please do not hesitate to tell me more
>     > about
>     > what you
>     > think TsFile should want to have?
>     >
>     > Best,
>     > -----------------------------------
>     > Xiangdong Huang
>     > School of Software, Tsinghua University
>     >
>     > 黄向东
>     > 清华大学 软件学院
>     >
>     >
>     > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一
>     > 下午7:17写道：
>     >
>     > Hi Xiangdong,
>     >
>     > thanks for the info.
>     > How is it in the case when you use the Reader / Writer API for
>     > the
>     > tsfiles
>     > directly (or should this be considered "deprecated")?
>     > Can these files come to corrupted state?
>     >
>     > One Situation where we have to deal with these situations is
>     > "at the
>     > edge"
>     > when we have devices inside large machines.
>     > Usually at the end of the shift these machines (and therefore
>     > our
>     > device)
>     > is powered off hard, so no shutdown or de-initialization is
>     > possible.
>     >
>     > Best
>     > Julian
>     >
>     > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" <
>     > saint...@gmail.com>:
>     >
>     > Hi,
>     >
>     > IoTDB can support either on a server with 7*24 or a
>     > RaspberryPi.
>     > We
>     > have
>     > tested both the two scenario.
>     >
>     > When you shutdown an IoTDB instance in force (e.g., power
>     > off)
>     > and
>     > restart
>     > it again, no data loses ( if you enable the WAL).
>     >
>     > However, currently we do not optimize the time cost of the
>     > restart
>     > process.
>     > It is an important feature that we need to do, because we
>     > hope
>     > IoTDB
>     > can
>     > support data management either on the edge devices or the
>     > data
>     > center.
>     >
>     > And, the default configuration is not so suitable for
>     > running on
>     > the
>     > edge
>     > device. (e.g., block size is 128MB, which is too large for
>     > a
>     > RaspberryPi,
>     > and will slow down the restart process because there are
>     > too
>     > much WAL
>     > data
>     > on disk).
>     >
>     > Best,
>     > -----------------------------------
>     > Xiangdong Huang
>     > School of Software, Tsinghua University
>     >
>     > 黄向东
>     > 清华大学 软件学院
>     >
>     >
>     > Tim Mitsch <t.mit...@pragmaticindustries.de> 于2019年3月4日周一
>     > 下午6:53写道：
>     >
>     > Hello development-team
>     >
>     > First of all thanks for developing this kind of
>     > interesting
>     > project
>     > and
>     > bringing it into apache incubator.
>     >
>     > I have a question regarding the place of operation and
>     > robustness:
>     >
>     > *   Is iotDB concepted as application on a server
>     > which is
>     > running
>     > 24/7
>     > or
>     > *   Is it also possible to run it on a device like
>     > RaspberryPi or
>     > IPC,
>     > where operation can interrupt.
>     > I’m asking because i’m searching for solution for a
>     > temporary
>     > storage that
>     > is robust against spontaneous interrupt, e.g. switch off
>     > electricity
>     > without regular shutdown of OS – have u tested something
>     > like
>     > this
>     > yet?
>     >
>     > Best regards
>     > Tim
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>
>
>
>

Re: Operation and robustness of iotDB

Reply via email to