> (We need to have a merge sort when querying more than one measurement)
Users do not need to care about that, because IoTDB/TsFile APIs have merge-sorted the data for users. ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Xiangdong Huang <saint...@gmail.com> 于2019年3月8日周五 上午8:40写道: > Hi, > > Yes, every Chunk has its timestamp column. We design like this because > different sensor may have different frequency.. For example, the rotate > speed of an engine may be collected 100 per second, while the GPS info is > collected every 1 second. And, their start time may be not aligned... > > So, if you consider the data as a table (timestamp, device name, sensor1, > sensor2,...), then it is a quite sparse table. Parquet introduces Definition > and Repeated Level Fields to read data row by row, but we think it is not > so natural for time series data. > > As a result, we store timestamp data on each column (I mean, Chunk). > Experiences show the disk overhead is little. And there is many advantages > for query. (We need to have a merge sort when querying more than one > measurement). > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月8日周五 上午2:08写道: > >> Hey Xu Yi, >> >> thanks fort he information. >> I checked the code and indeed I was wrong. >> Every Chunk also stores its timestamp. >> >> So when I read values through a Query all timestamps are "interpolated" >> or merged together from all sensors, or? >> >> Julian >> >> Am 07.03.19, 18:48 schrieb "Xu yi" <xuyith...@126.com>: >> >> Hi, >> >> In my opinion, different measurements use their own timestamp even >> though they are grouped into one chunk group.they don’t share from each >> other. >> >> What do you think of this @xiangdong >> >> Thanks >> XuYi >> >> iPhoneから送信 >> >> 2019/03/08 1:41、Julian Feinauer <j.feina...@pragmaticminds.de>のメール: >> >> > Hi, >> > >> > Yes this is what I meant. >> > >> > Julian >> > >> > Von meinem Mobiltelefon gesendet >> > >> > >> > -------- Ursprüngliche Nachricht -------- >> > Betreff: Re: Operation and robustness of iotDB >> > Von: 徐毅 >> > An: dev@iotdb.apache.org >> > Cc: >> > >> > Hi, >> > In the definition of ChunkGroup, what is the meaning of 'share one >> time signal'? Do these measurements share same timestamps? >> > >> > >> > Thanks >> > XuYi >> > On 3/8/2019 01:11,Julian Feinauer<j.feina...@pragmaticminds.de> >> wrote: >> > Hey Xiangdong, >> > hey all, >> > >> > I like the documentation much. >> > The only thing I'm a bit unsure is about the names (as there is no >> clarification). >> > So, before I update it with any wrong information I would like to >> ensure that I have the correct understanding. >> > >> > I assume that most naming is similar to Parquet. >> > >> > Page - Contains one Measurement, smallest source of compression >> > Chunk - Collection of multiple Pages, still one measurement >> > ChunkGroup - Collection of chunks of which share one time signal >> (one Chunk for each measurement) >> > >> > Is this correct so? >> > >> > Julian >> > >> > Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <saint...@gmail.com>: >> > >> > Hi, >> > >> > 1. We have a document to introduce that: >> > https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format >> > >> > 2. The new API for recovering data is almost done. I am writing the >> UTs >> > now. Maybe I can submit a PR tonight (if everything is fine...) >> > >> > Best, >> > ----------------------------------- >> > Xiangdong Huang >> > School of Software, Tsinghua University >> > >> > 黄向东 >> > 清华大学 软件学院 >> > >> > >> > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月5日周二 >> 下午6:00写道: >> > >> > Hi Xiangdong, >> > >> > that sounds excellent. >> > Do you have a short overview of how the file format is designed on >> disk? >> > I know that its somewhat similar to parquet but I did not find more >> > details. >> > Basically what would suffice for us would be something like >> skipping an >> > invalid column group (or how you name it) and go on with the next, >> or so. >> > >> > Julian >> > >> > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <saint...@gmail.com>: >> > >> > Hi, >> > >> > If so, I think I need to add a new API to allow you continue to >> write >> > data >> > in an existing but not closed correctly TsFile. Then everything is >> > fine >> > for you :D >> > >> > Best, >> > ----------------------------------- >> > Xiangdong Huang >> > School of Software, Tsinghua University >> > >> > 黄向东 >> > 清华大学 软件学院 >> > >> > >> > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一 >> 下午8:08写道: >> > >> > Hey Xiangdong, >> > >> > thanks for the great explanation. >> > And in fact, I agree with you that it would be best if we start to >> > play >> > around with it and reply all our findings or wishes back to this >> > list (in >> > fact that proved to be beneficial in plc4x as well). >> > >> > You confirm my thoughts about the two "levels" of APIs (DB and file) >> > and >> > the file api is exactly what we looked for for our use case. >> > As we do not care much about data loss (when an edge device fails >> > its... >> > gone). >> > The crucial point for us is that no corrupt files can be generated. >> > This means I'm fine when the last data submitted is lost but I'm not >> > fine >> > if we can get to a situation where the last datafile is completely >> > lost >> > (well, perhaps this could be acceptable). >> > >> > @tim: Perhaps its best when you give some more information to >> > Xiangdong >> > about our idea, and we can also point to our current code in github >> > >> > Julian >> > >> > Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <saint...@gmail.com>: >> > >> > Hi, >> > >> > TsFile API is not deprecated. In fact, it is designed for this >> > scenario and >> > MapReduce/Spark computing. >> > >> > If you just use Reader and Writer API, there is something you >> > need to >> > know: >> > >> > Let's suppose your block size is x Bytes, >> > (tsfile-format.properties: >> > group_size_in_byte). >> > >> > 1. If you write data and a shutdown occurs, then all data that is >> > flushed >> > on disk is ok, and you can read the data ( class >> > org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but >> > you need >> > to >> > change it a little. I think I can write an example.) >> > >> > 2. Actually, TsFile has the ability to allow you continue to >> > write >> > data at >> > the end of the incomplete file. However, We do not provide this >> > API >> > now... >> > If needed, I can add the API. >> > >> > 3. In this scenario, you will lose at most x Bytes data. If you >> > do not >> > accept that, something like WAL is needed. (It is not very >> > complex, >> > but I >> > am not sure that whether it should be an embedded function for >> > TsFile). >> > >> > Up to now, we can consider that TsFile API is suitable for your >> > scenario >> > (even though we need to add a little more API if you desire). >> > And you >> > can >> > get the ability to compress data, and query data from the TsFile >> > rather >> > than scan the data from the head to the tail. >> > >> > However, TsFile has one constraint: You can not write >> > out-of-order data >> > into a TsFile, otherwise the query API may return incomplete >> > result. >> > But I think it is ok for real applications, because I do not >> > think >> > that a >> > device can generate out-of-order data.... >> > >> > For example, If you write two devices' data into one TsFile, it >> > is ok >> > if >> > you write data like: >> > - d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 .... >> > or: >> > - d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ... >> > >> > But you can not write data like: >> > - d1.m1.t2, d1.m1.t1 ... >> > >> > I think it is a good chance to improve TsFile to make it more >> > suitable >> > for >> > real applications, so please do not hesitate to tell me more >> > about >> > what you >> > think TsFile should want to have? >> > >> > Best, >> > ----------------------------------- >> > Xiangdong Huang >> > School of Software, Tsinghua University >> > >> > 黄向东 >> > 清华大学 软件学院 >> > >> > >> > Julian Feinauer <j.feina...@pragmaticminds.de> 于2019年3月4日周一 >> > 下午7:17写道: >> > >> > Hi Xiangdong, >> > >> > thanks for the info. >> > How is it in the case when you use the Reader / Writer API for >> > the >> > tsfiles >> > directly (or should this be considered "deprecated")? >> > Can these files come to corrupted state? >> > >> > One Situation where we have to deal with these situations is >> > "at the >> > edge" >> > when we have devices inside large machines. >> > Usually at the end of the shift these machines (and therefore >> > our >> > device) >> > is powered off hard, so no shutdown or de-initialization is >> > possible. >> > >> > Best >> > Julian >> > >> > Am 04.03.19, 12:14 schrieb "Xiangdong Huang" < >> > saint...@gmail.com>: >> > >> > Hi, >> > >> > IoTDB can support either on a server with 7*24 or a >> > RaspberryPi. >> > We >> > have >> > tested both the two scenario. >> > >> > When you shutdown an IoTDB instance in force (e.g., power >> > off) >> > and >> > restart >> > it again, no data loses ( if you enable the WAL). >> > >> > However, currently we do not optimize the time cost of the >> > restart >> > process. >> > It is an important feature that we need to do, because we >> > hope >> > IoTDB >> > can >> > support data management either on the edge devices or the >> > data >> > center. >> > >> > And, the default configuration is not so suitable for >> > running on >> > the >> > edge >> > device. (e.g., block size is 128MB, which is too large for >> > a >> > RaspberryPi, >> > and will slow down the restart process because there are >> > too >> > much WAL >> > data >> > on disk). >> > >> > Best, >> > ----------------------------------- >> > Xiangdong Huang >> > School of Software, Tsinghua University >> > >> > 黄向东 >> > 清华大学 软件学院 >> > >> > >> > Tim Mitsch <t.mit...@pragmaticindustries.de> 于2019年3月4日周一 >> > 下午6:53写道: >> > >> > Hello development-team >> > >> > First of all thanks for developing this kind of >> > interesting >> > project >> > and >> > bringing it into apache incubator. >> > >> > I have a question regarding the place of operation and >> > robustness: >> > >> > * Is iotDB concepted as application on a server >> > which is >> > running >> > 24/7 >> > or >> > * Is it also possible to run it on a device like >> > RaspberryPi or >> > IPC, >> > where operation can interrupt. >> > I’m asking because i’m searching for solution for a >> > temporary >> > storage that >> > is robust against spontaneous interrupt, e.g. switch off >> > electricity >> > without regular shutdown of OS – have u tested something >> > like >> > this >> > yet? >> > >> > Best regards >> > Tim >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >> >> >>