Hey,

thank you fort he link... I did not know of this.. this is exactly what I was 
looking for!

Julian

PS.: Looking forward to your PR : )

Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <saint...@gmail.com>:

    Hi,
    
    1. We have a document to introduce that:
    https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
    
    2. The new API for recovering data is almost done. I am writing the UTs
    now. Maybe I can submit a PR tonight (if everything is fine...)
    
    Best,
    -----------------------------------
    Xiangdong Huang
    School of Software, Tsinghua University
    
     黄向东
    清华大学 软件学院
    
    
    Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月5日周二 下午6:00写道:
    
    > Hi Xiangdong,
    >
    > that sounds excellent.
    > Do you have a short overview of how the file format is designed on disk?
    > I know that its somewhat similar to parquet but I did not find more
    > details.
    > Basically what would suffice for us would be something like skipping an
    > invalid column group (or how you name it) and go on with the next, or so.
    >
    > Julian
    >
    > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <saint...@gmail.com>:
    >
    >     Hi,
    >
    >     If so, I think I need to add a new API to allow you continue to write
    > data
    >     in an existing  but not closed correctly TsFile. Then everything is
    > fine
    >     for you :D
    >
    >     Best,
    >     -----------------------------------
    >     Xiangdong Huang
    >     School of Software, Tsinghua University
    >
    >      黄向东
    >     清华大学 软件学院
    >
    >
    >     Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一 下午8:08写道:
    >
    >     > Hey Xiangdong,
    >     >
    >     > thanks for the great explanation.
    >     > And in fact, I agree with you that it would be best if we start to
    > play
    >     > around with it and reply all our findings or wishes back to this
    > list (in
    >     > fact that proved to be beneficial in plc4x as well).
    >     >
    >     > You confirm my thoughts about the two "levels" of APIs (DB and file)
    > and
    >     > the file api is exactly what we looked for for our use case.
    >     > As we do not care much about data loss (when an edge device fails
    > its...
    >     > gone).
    >     > The crucial point for us is that no corrupt files can be generated.
    >     > This means I'm fine when the last data submitted is lost but I'm not
    > fine
    >     > if we can get to a situation where the last datafile is completely
    > lost
    >     > (well, perhaps this could be acceptable).
    >     >
    >     > @tim: Perhaps its best when you give some more information to
    > Xiangdong
    >     > about our idea, and we can also point to our current code in github
    >     >
    >     > Julian
    >     >
    >     > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <saint...@gmail.com>:
    >     >
    >     >     Hi,
    >     >
    >     >     TsFile API is not deprecated. In fact, it is designed for this
    >     > scenario and
    >     >     MapReduce/Spark computing.
    >     >
    >     >     If you just use Reader and Writer API, there is something you
    > need to
    >     > know:
    >     >
    >     >     Let's suppose your block size is x Bytes,
    > (tsfile-format.properties:
    >     >     group_size_in_byte).
    >     >
    >     >     1. If you write data and a shutdown occurs, then all data that 
is
    >     > flushed
    >     >     on disk is ok, and you can read the data ( class
    >     >     org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
    > you need
    >     > to
    >     >     change it a little. I think I can write an example.)
    >     >
    >     >     2. Actually, TsFile has the ability to allow you continue to
    > write
    >     > data at
    >     >     the end of the incomplete file. However, We do not provide this
    > API
    >     > now...
    >     >     If needed, I can add the API.
    >     >
    >     >     3. In this scenario, you will lose at most x Bytes data. If you
    > do not
    >     >     accept that, something like WAL is needed. (It is not very
    > complex,
    >     > but I
    >     >     am not sure that whether it should be an embedded function for
    > TsFile).
    >     >
    >     >     Up to now, we can consider that TsFile API is suitable for your
    >     > scenario
    >     >     (even though we need to add a little more API if you desire).
    > And you
    >     > can
    >     >     get the ability to compress data, and query data from the TsFile
    > rather
    >     >     than scan the data from the head to the tail.
    >     >
    >     >     However, TsFile has one constraint: You can not write
    > out-of-order data
    >     >     into a TsFile, otherwise the query API may return incomplete
    > result.
    >     >     But I think it is ok for real applications, because I do not
    > think
    >     > that a
    >     >     device can generate out-of-order data....
    >     >
    >     >     For example, If you write two devices' data into one TsFile, it
    > is ok
    >     > if
    >     >     you write data like:
    >     >     - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 ....
    >     >     or:
    >     >     - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ...
    >     >
    >     >     But you can not write data like:
    >     >     - d1.m1.t2, d1.m1.t1 ...
    >     >
    >     >     I think it is a good chance to improve TsFile to make it more
    > suitable
    >     > for
    >     >     real applications, so please do not hesitate to tell me more
    > about
    >     > what you
    >     >     think TsFile should want to have?
    >     >
    >     >     Best,
    >     >     -----------------------------------
    >     >     Xiangdong Huang
    >     >     School of Software, Tsinghua University
    >     >
    >     >      黄向东
    >     >     清华大学 软件学院
    >     >
    >     >
    >     >     Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一
    > 下午7:17写道:
    >     >
    >     >     > Hi Xiangdong,
    >     >     >
    >     >     > thanks for the info.
    >     >     > How is it in the case when you use the Reader / Writer API for
    > the
    >     > tsfiles
    >     >     > directly (or should this be considered "deprecated")?
    >     >     > Can these files come to corrupted state?
    >     >     >
    >     >     > One Situation where we have to deal with these situations is
    > "at the
    >     > edge"
    >     >     > when we have devices inside large machines.
    >     >     > Usually at the end of the shift these machines (and therefore
    > our
    >     > device)
    >     >     > is powered off hard, so no shutdown or de-initialization is
    > possible.
    >     >     >
    >     >     > Best
    >     >     > Julian
    >     >     >
    >     >     > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" <
    > saint...@gmail.com>:
    >     >     >
    >     >     >     Hi,
    >     >     >
    >     >     >     IoTDB can support either on a server with 7*24 or a
    > RaspberryPi.
    >     > We
    >     >     > have
    >     >     >     tested both the two scenario.
    >     >     >
    >     >     >     When you shutdown an IoTDB instance in force (e.g., power
    > off)
    >     > and
    >     >     > restart
    >     >     >     it again, no data loses ( if you enable the WAL).
    >     >     >
    >     >     >     However, currently we do not optimize the time cost of the
    >     > restart
    >     >     > process.
    >     >     >     It is an important feature that we need to do, because we
    > hope
    >     > IoTDB
    >     >     > can
    >     >     >     support data management either on the edge devices or the
    > data
    >     > center.
    >     >     >
    >     >     >     And, the default configuration is not so suitable for
    > running on
    >     > the
    >     >     > edge
    >     >     >     device. (e.g., block size is 128MB, which is too large for
    > a
    >     >     > RaspberryPi,
    >     >     >     and will slow down the restart process because there are
    > too
    >     > much WAL
    >     >     > data
    >     >     >     on disk).
    >     >     >
    >     >     >     Best,
    >     >     >     -----------------------------------
    >     >     >     Xiangdong Huang
    >     >     >     School of Software, Tsinghua University
    >     >     >
    >     >     >      黄向东
    >     >     >     清华大学 软件学院
    >     >     >
    >     >     >
    >     >     >     Tim Mitsch <t.mit...@pragmaticindustries.de> 于2019年3月4日周一
    >     > 下午6:53写道:
    >     >     >
    >     >     >     > Hello development-team
    >     >     >     >
    >     >     >     > First of all thanks for developing this kind of
    > interesting
    >     > project
    >     >     > and
    >     >     >     > bringing it into apache incubator.
    >     >     >     >
    >     >     >     > I have a question regarding the place of operation and
    >     > robustness:
    >     >     >     >
    >     >     >     >   *   Is iotDB concepted as application on a server
    > which is
    >     > running
    >     >     > 24/7
    >     >     >     > or
    >     >     >     >   *   Is it also possible to run it on a device like
    >     > RaspberryPi or
    >     >     > IPC,
    >     >     >     > where operation can interrupt.
    >     >     >     > I’m asking because i’m searching for solution for a
    > temporary
    >     >     > storage that
    >     >     >     > is robust against spontaneous interrupt, e.g. switch off
    >     > electricity
    >     >     >     > without regular shutdown of OS – have u tested something
    > like
    >     > this
    >     >     > yet?
    >     >     >     >
    >     >     >     > Best regards
    >     >     >     > Tim
    >     >     >     >
    >     >     >     >
    >     >     >     >
    >     >     >
    >     >     >
    >     >     >
    >     >
    >     >
    >     >
    >
    >
    >
    

Reply via email to