Hi, If so, I think I need to add a new API to allow you continue to write data in an existing but not closed correctly TsFile. Then everything is fine for you :D
Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Julian Feinauer <[email protected]> 于2019年3月4日周一 下午8:08写道: > Hey Xiangdong, > > thanks for the great explanation. > And in fact, I agree with you that it would be best if we start to play > around with it and reply all our findings or wishes back to this list (in > fact that proved to be beneficial in plc4x as well). > > You confirm my thoughts about the two "levels" of APIs (DB and file) and > the file api is exactly what we looked for for our use case. > As we do not care much about data loss (when an edge device fails its... > gone). > The crucial point for us is that no corrupt files can be generated. > This means I'm fine when the last data submitted is lost but I'm not fine > if we can get to a situation where the last datafile is completely lost > (well, perhaps this could be acceptable). > > @tim: Perhaps its best when you give some more information to Xiangdong > about our idea, and we can also point to our current code in github > > Julian > > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <[email protected]>: > > Hi, > > TsFile API is not deprecated. In fact, it is designed for this > scenario and > MapReduce/Spark computing. > > If you just use Reader and Writer API, there is something you need to > know: > > Let's suppose your block size is x Bytes, (tsfile-format.properties: > group_size_in_byte). > > 1. If you write data and a shutdown occurs, then all data that is > flushed > on disk is ok, and you can read the data ( class > org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but you need > to > change it a little. I think I can write an example.) > > 2. Actually, TsFile has the ability to allow you continue to write > data at > the end of the incomplete file. However, We do not provide this API > now... > If needed, I can add the API. > > 3. In this scenario, you will lose at most x Bytes data. If you do not > accept that, something like WAL is needed. (It is not very complex, > but I > am not sure that whether it should be an embedded function for TsFile). > > Up to now, we can consider that TsFile API is suitable for your > scenario > (even though we need to add a little more API if you desire). And you > can > get the ability to compress data, and query data from the TsFile rather > than scan the data from the head to the tail. > > However, TsFile has one constraint: You can not write out-of-order data > into a TsFile, otherwise the query API may return incomplete result. > But I think it is ok for real applications, because I do not think > that a > device can generate out-of-order data.... > > For example, If you write two devices' data into one TsFile, it is ok > if > you write data like: > - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 .... > or: > - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ... > > But you can not write data like: > - d1.m1.t2, d1.m1.t1 ... > > I think it is a good chance to improve TsFile to make it more suitable > for > real applications, so please do not hesitate to tell me more about > what you > think TsFile should want to have? > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Julian Feinauer <[email protected]> 于2019年3月4日周一 下午7:17写道: > > > Hi Xiangdong, > > > > thanks for the info. > > How is it in the case when you use the Reader / Writer API for the > tsfiles > > directly (or should this be considered "deprecated")? > > Can these files come to corrupted state? > > > > One Situation where we have to deal with these situations is "at the > edge" > > when we have devices inside large machines. > > Usually at the end of the shift these machines (and therefore our > device) > > is powered off hard, so no shutdown or de-initialization is possible. > > > > Best > > Julian > > > > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" <[email protected]>: > > > > Hi, > > > > IoTDB can support either on a server with 7*24 or a RaspberryPi. > We > > have > > tested both the two scenario. > > > > When you shutdown an IoTDB instance in force (e.g., power off) > and > > restart > > it again, no data loses ( if you enable the WAL). > > > > However, currently we do not optimize the time cost of the > restart > > process. > > It is an important feature that we need to do, because we hope > IoTDB > > can > > support data management either on the edge devices or the data > center. > > > > And, the default configuration is not so suitable for running on > the > > edge > > device. (e.g., block size is 128MB, which is too large for a > > RaspberryPi, > > and will slow down the restart process because there are too > much WAL > > data > > on disk). > > > > Best, > > ----------------------------------- > > Xiangdong Huang > > School of Software, Tsinghua University > > > > 黄向东 > > 清华大学 软件学院 > > > > > > Tim Mitsch <[email protected]> 于2019年3月4日周一 > 下午6:53写道: > > > > > Hello development-team > > > > > > First of all thanks for developing this kind of interesting > project > > and > > > bringing it into apache incubator. > > > > > > I have a question regarding the place of operation and > robustness: > > > > > > * Is iotDB concepted as application on a server which is > running > > 24/7 > > > or > > > * Is it also possible to run it on a device like > RaspberryPi or > > IPC, > > > where operation can interrupt. > > > I’m asking because i’m searching for solution for a temporary > > storage that > > > is robust against spontaneous interrupt, e.g. switch off > electricity > > > without regular shutdown of OS – have u tested something like > this > > yet? > > > > > > Best regards > > > Tim > > > > > > > > > > > > > > > > > >
