Re: Operation and robustness of iotDB

Julian Feinauer Thu, 07 Mar 2019 10:09:09 -0800

Hey Xu Yi,

thanks fort he information.
I checked the code and indeed I was wrong.
Every Chunk also stores its timestamp.


So when I read values through a Query all timestamps are "interpolated" or 
merged together from all sensors, or?

Julian

Am 07.03.19, 18:48 schrieb "Xu yi" <[email protected]>:

    Hi,
    
    In my opinion, different measurements use their own timestamp even though 
they are grouped into one chunk group.they don’t share from each other.
    
    What do you think of this @xiangdong 
    
    Thanks 
    XuYi 
    
    iPhoneから送信
    
    2019/03/08 1:41、Julian Feinauer <[email protected]>のメール:
    
    > Hi,
    > 
    > Yes this is what I meant.
    > 
    > Julian
    > 
    > Von meinem Mobiltelefon gesendet
    > 
    > 
    > -------- Ursprüngliche Nachricht --------
    > Betreff: Re: Operation and robustness of iotDB
    > Von: 徐毅
    > An: [email protected]
    > Cc:
    > 
    > Hi,
    > In the definition of ChunkGroup, what is the meaning of 'share one time 
signal'? Do these measurements share same timestamps?
    > 
    > 
    > Thanks
    > XuYi
    > On 3/8/2019 01:11，Julian Feinauer<[email protected]> wrote：
    > Hey Xiangdong,
    > hey all,
    > 
    > I like the documentation much.
    > The only thing I'm a bit unsure is about the names (as there is no 
clarification).
    > So, before I update it with any wrong information I would like to ensure 
that I have the correct understanding.
    > 
    > I assume that most naming is similar to Parquet.
    > 
    > Page - Contains one Measurement, smallest source of compression
    > Chunk - Collection of multiple Pages, still one measurement
    > ChunkGroup - Collection of chunks of which share one time signal (one 
Chunk for each measurement)
    > 
    > Is this correct so?
    > 
    > Julian
    > 
    > Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <[email protected]>:
    > 
    > Hi,
    > 
    > 1. We have a document to introduce that:
    > https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
    > 
    > 2. The new API for recovering data is almost done. I am writing the UTs
    > now. Maybe I can submit a PR tonight (if everything is fine...)
    > 
    > Best,
    > -----------------------------------
    > Xiangdong Huang
    > School of Software, Tsinghua University
    > 
    > 黄向东
    > 清华大学 软件学院
    > 
    > 
    > Julian Feinauer <[email protected]> 于2019年3月5日周二 下午6:00写道：
    > 
    > Hi Xiangdong,
    > 
    > that sounds excellent.
    > Do you have a short overview of how the file format is designed on disk?
    > I know that its somewhat similar to parquet but I did not find more
    > details.
    > Basically what would suffice for us would be something like skipping an
    > invalid column group (or how you name it) and go on with the next, or so.
    > 
    > Julian
    > 
    > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <[email protected]>:
    > 
    > Hi,
    > 
    > If so, I think I need to add a new API to allow you continue to write
    > data
    > in an existing  but not closed correctly TsFile. Then everything is
    > fine
    > for you :D
    > 
    > Best,
    > -----------------------------------
    > Xiangdong Huang
    > School of Software, Tsinghua University
    > 
    > 黄向东
    > 清华大学 软件学院
    > 
    > 
    > Julian Feinauer <[email protected]> 于2019年3月4日周一 下午8:08写道：
    > 
    > Hey Xiangdong,
    > 
    > thanks for the great explanation.
    > And in fact, I agree with you that it would be best if we start to
    > play
    > around with it and reply all our findings or wishes back to this
    > list (in
    > fact that proved to be beneficial in plc4x as well).
    > 
    > You confirm my thoughts about the two "levels" of APIs (DB and file)
    > and
    > the file api is exactly what we looked for for our use case.
    > As we do not care much about data loss (when an edge device fails
    > its...
    > gone).
    > The crucial point for us is that no corrupt files can be generated.
    > This means I'm fine when the last data submitted is lost but I'm not
    > fine
    > if we can get to a situation where the last datafile is completely
    > lost
    > (well, perhaps this could be acceptable).
    > 
    > @tim: Perhaps its best when you give some more information to
    > Xiangdong
    > about our idea, and we can also point to our current code in github
    > 
    > Julian
    > 
    > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <[email protected]>:
    > 
    > Hi,
    > 
    > TsFile API is not deprecated. In fact, it is designed for this
    > scenario and
    > MapReduce/Spark computing.
    > 
    > If you just use Reader and Writer API, there is something you
    > need to
    > know:
    > 
    > Let's suppose your block size is x Bytes,
    > (tsfile-format.properties:
    > group_size_in_byte).
    > 
    > 1. If you write data and a shutdown occurs, then all data that is
    > flushed
    > on disk is ok, and you can read the data ( class
    > org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
    > you need
    > to
    > change it a little. I think I can write an example.)
    > 
    > 2. Actually, TsFile has the ability to allow you continue to
    > write
    > data at
    > the end of the incomplete file. However, We do not provide this
    > API
    > now...
    > If needed, I can add the API.
    > 
    > 3. In this scenario, you will lose at most x Bytes data. If you
    > do not
    > accept that, something like WAL is needed. (It is not very
    > complex,
    > but I
    > am not sure that whether it should be an embedded function for
    > TsFile).
    > 
    > Up to now, we can consider that TsFile API is suitable for your
    > scenario
    > (even though we need to add a little more API if you desire).
    > And you
    > can
    > get the ability to compress data, and query data from the TsFile
    > rather
    > than scan the data from the head to the tail.
    > 
    > However, TsFile has one constraint: You can not write
    > out-of-order data
    > into a TsFile, otherwise the query API may return incomplete
    > result.
    > But I think it is ok for real applications, because I do not
    > think
    > that a
    > device can generate out-of-order data....
    > 
    > For example, If you write two devices' data into one TsFile, it
    > is ok
    > if
    > you write data like:
    > - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 ....
    > or:
    > - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ...
    > 
    > But you can not write data like:
    > - d1.m1.t2, d1.m1.t1 ...
    > 
    > I think it is a good chance to improve TsFile to make it more
    > suitable
    > for
    > real applications, so please do not hesitate to tell me more
    > about
    > what you
    > think TsFile should want to have?
    > 
    > Best,
    > -----------------------------------
    > Xiangdong Huang
    > School of Software, Tsinghua University
    > 
    > 黄向东
    > 清华大学 软件学院
    > 
    > 
    > Julian Feinauer <[email protected]> 于2019年3月4日周一
    > 下午7:17写道：
    > 
    > Hi Xiangdong,
    > 
    > thanks for the info.
    > How is it in the case when you use the Reader / Writer API for
    > the
    > tsfiles
    > directly (or should this be considered "deprecated")?
    > Can these files come to corrupted state?
    > 
    > One Situation where we have to deal with these situations is
    > "at the
    > edge"
    > when we have devices inside large machines.
    > Usually at the end of the shift these machines (and therefore
    > our
    > device)
    > is powered off hard, so no shutdown or de-initialization is
    > possible.
    > 
    > Best
    > Julian
    > 
    > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" <
    > [email protected]>:
    > 
    > Hi,
    > 
    > IoTDB can support either on a server with 7*24 or a
    > RaspberryPi.
    > We
    > have
    > tested both the two scenario.
    > 
    > When you shutdown an IoTDB instance in force (e.g., power
    > off)
    > and
    > restart
    > it again, no data loses ( if you enable the WAL).
    > 
    > However, currently we do not optimize the time cost of the
    > restart
    > process.
    > It is an important feature that we need to do, because we
    > hope
    > IoTDB
    > can
    > support data management either on the edge devices or the
    > data
    > center.
    > 
    > And, the default configuration is not so suitable for
    > running on
    > the
    > edge
    > device. (e.g., block size is 128MB, which is too large for
    > a
    > RaspberryPi,
    > and will slow down the restart process because there are
    > too
    > much WAL
    > data
    > on disk).
    > 
    > Best,
    > -----------------------------------
    > Xiangdong Huang
    > School of Software, Tsinghua University
    > 
    > 黄向东
    > 清华大学 软件学院
    > 
    > 
    > Tim Mitsch <[email protected]> 于2019年3月4日周一
    > 下午6:53写道：
    > 
    > Hello development-team
    > 
    > First of all thanks for developing this kind of
    > interesting
    > project
    > and
    > bringing it into apache incubator.
    > 
    > I have a question regarding the place of operation and
    > robustness:
    > 
    > *   Is iotDB concepted as application on a server
    > which is
    > running
    > 24/7
    > or
    > *   Is it also possible to run it on a device like
    > RaspberryPi or
    > IPC,
    > where operation can interrupt.
    > I’m asking because i’m searching for solution for a
    > temporary
    > storage that
    > is robust against spontaneous interrupt, e.g. switch off
    > electricity
    > without regular shutdown of OS – have u tested something
    > like
    > this
    > yet?
    > 
    > Best regards
    > Tim
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    > 
    >

Re: Operation and robustness of iotDB

Reply via email to