Re: Operation and robustness of iotDB

徐毅 Thu, 07 Mar 2019 09:33:08 -0800

Hi,
In the definition of ChunkGroup, what is the meaning of 'share one time 
signal'? Do these measurements share same timestamps?



Thanks
XuYi
On 3/8/2019 01:11，Julian Feinauer<[email protected]> wrote：
Hey Xiangdong,
hey all,

I like the documentation much.
The only thing I'm a bit unsure is about the names (as there is no 
clarification).
So, before I update it with any wrong information I would like to ensure that I 
have the correct understanding.

I assume that most naming is similar to Parquet.

Page - Contains one Measurement, smallest source of compression
Chunk - Collection of multiple Pages, still one measurement
ChunkGroup - Collection of chunks of which share one time signal (one Chunk for 
each measurement)

Is this correct so?

Julian

Am 05.03.19, 12:26 schrieb "Xiangdong Huang" <[email protected]>:

Hi,

1. We have a document to introduce that:
https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format

2. The new API for recovering data is almost done. I am writing the UTs
now. Maybe I can submit a PR tonight (if everything is fine...)

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer <[email protected]> 于2019年3月5日周二 下午6:00写道：

Hi Xiangdong,

that sounds excellent.
Do you have a short overview of how the file format is designed on disk?
I know that its somewhat similar to parquet but I did not find more
details.
Basically what would suffice for us would be something like skipping an
invalid column group (or how you name it) and go on with the next, or so.

Julian

Am 04.03.19, 13:21 schrieb "Xiangdong Huang" <[email protected]>:

Hi,

If so, I think I need to add a new API to allow you continue to write
data
in an existing  but not closed correctly TsFile. Then everything is
fine
for you :D

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer <[email protected]> 于2019年3月4日周一 下午8:08写道：

Hey Xiangdong,

thanks for the great explanation.
And in fact, I agree with you that it would be best if we start to
play
around with it and reply all our findings or wishes back to this
list (in
fact that proved to be beneficial in plc4x as well).

You confirm my thoughts about the two "levels" of APIs (DB and file)
and
the file api is exactly what we looked for for our use case.
As we do not care much about data loss (when an edge device fails
its...
gone).
The crucial point for us is that no corrupt files can be generated.
This means I'm fine when the last data submitted is lost but I'm not
fine
if we can get to a situation where the last datafile is completely
lost
(well, perhaps this could be acceptable).

@tim: Perhaps its best when you give some more information to
Xiangdong
about our idea, and we can also point to our current code in github

Julian

Am 04.03.19, 13:03 schrieb "Xiangdong Huang" <[email protected]>:

Hi,

TsFile API is not deprecated. In fact, it is designed for this
scenario and
MapReduce/Spark computing.

If you just use Reader and Writer API, there is something you
need to
know:

Let's suppose your block size is x Bytes,
(tsfile-format.properties:
group_size_in_byte).

1. If you write data and a shutdown occurs, then all data that is
flushed
on disk is ok, and you can read the data ( class
org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
you need
to
change it a little. I think I can write an example.)

2. Actually, TsFile has the ability to allow you continue to
write
data at
the end of the incomplete file. However, We do not provide this
API
now...
If needed, I can add the API.

3. In this scenario, you will lose at most x Bytes data. If you
do not
accept that, something like WAL is needed. (It is not very
complex,
but I
am not sure that whether it should be an embedded function for
TsFile).

Up to now, we can consider that TsFile API is suitable for your
scenario
(even though we need to add a little more API if you desire).
And you
can
get the ability to compress data, and query data from the TsFile
rather
than scan the data from the head to the tail.

However, TsFile has one constraint: You can not write
out-of-order data
into a TsFile, otherwise the query API may return incomplete
result.
But I think it is ok for real applications, because I do not
think
that a
device can generate out-of-order data....

For example, If you write two devices' data into one TsFile, it
is ok
if
you write data like:
- d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 ....
or:
- d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ...

But you can not write data like:
- d1.m1.t2, d1.m1.t1 ...

I think it is a good chance to improve TsFile to make it more
suitable
for
real applications, so please do not hesitate to tell me more
about
what you
think TsFile should want to have?

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer <[email protected]> 于2019年3月4日周一
下午7:17写道：

Hi Xiangdong,

thanks for the info.
How is it in the case when you use the Reader / Writer API for
the
tsfiles
directly (or should this be considered "deprecated")?
Can these files come to corrupted state?

One Situation where we have to deal with these situations is
"at the
edge"
when we have devices inside large machines.
Usually at the end of the shift these machines (and therefore
our
device)
is powered off hard, so no shutdown or de-initialization is
possible.

Best
Julian

Am 04.03.19, 12:14 schrieb "Xiangdong Huang" <
[email protected]>:

Hi,

IoTDB can support either on a server with 7*24 or a
RaspberryPi.
We
have
tested both the two scenario.

When you shutdown an IoTDB instance in force (e.g., power
off)
and
restart
it again, no data loses ( if you enable the WAL).

However, currently we do not optimize the time cost of the
restart
process.
It is an important feature that we need to do, because we
hope
IoTDB
can
support data management either on the edge devices or the
data
center.

And, the default configuration is not so suitable for
running on
the
edge
device. (e.g., block size is 128MB, which is too large for
a
RaspberryPi,
and will slow down the restart process because there are
too
much WAL
data
on disk).

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Tim Mitsch <[email protected]> 于2019年3月4日周一
下午6:53写道：

Hello development-team

First of all thanks for developing this kind of
interesting
project
and
bringing it into apache incubator.

I have a question regarding the place of operation and
robustness:

*   Is iotDB concepted as application on a server
which is
running
24/7
or
*   Is it also possible to run it on a device like
RaspberryPi or
IPC,
where operation can interrupt.
I’m asking because i’m searching for solution for a
temporary
storage that
is robust against spontaneous interrupt, e.g. switch off
electricity
without regular shutdown of OS – have u tested something
like
this
yet?

Best regards
Tim

Re: Operation and robustness of iotDB

Reply via email to