This is an automated email from the ASF dual-hosted git repository.
haonan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git
The following commit(s) were added to refs/heads/master by this push:
new 8289372 Update tsdb comparison doc (#2601)
8289372 is described below
commit 8289372279ef9e14763b8bb40bb81ba7e1bdac3b
Author: zhanglingzhe0820 <[email protected]>
AuthorDate: Sat Feb 20 09:44:19 2021 +0800
Update tsdb comparison doc (#2601)
* update TSDB-Comparison doc
Co-authored-by: zhanglingzhe <[email protected]>
Co-authored-by: Haonan <[email protected]>
---
docs/UserGuide/Comparison/TSDB-Comparison.md | 107 ++++++++++++---------------
1 file changed, 48 insertions(+), 59 deletions(-)
diff --git a/docs/UserGuide/Comparison/TSDB-Comparison.md
b/docs/UserGuide/Comparison/TSDB-Comparison.md
index cb9c23e..fcd1826 100644
--- a/docs/UserGuide/Comparison/TSDB-Comparison.md
+++ b/docs/UserGuide/Comparison/TSDB-Comparison.md
@@ -43,7 +43,7 @@ However, few of them are developed for IoT or IIoT
(Industrial IoT) scenario in
Interface: Restful API
-* TimesacleDB - Time series database based on Relational Database
+* TimescaleDB - Time series database based on Relational Database
Interface: SQL
@@ -284,101 +284,90 @@ It is somehow right. But, if you consider the
performance, you may change your m
#### quick review
-Given a workload:
-
* Write:
-10 clients write data concurrently. The number of storage group is 50. There
are 1000 devices and each device has 100 measurements (i.e.,, 100K time series
totally).
-The data type is float and IoTDB uses RLE encoding and Snappy compression.
-IoTDB uses batch insertion API and the batch size is 100 (write 100 data
points per write API call).
+We test the performance of writing from two aspects: *batch size* and *client
num*. The number of storage group is 10. There are 1000 devices and each device
has 100 measurements(i.e.,, 100K time series total).
* Read:
-50 clients read data concurrently. Each client just read data from 1 device
with 10 measurements in one storage group.
+10 clients read data concurrently. The number of storage group is 10. There
are 10 devices and each device has 10 measurements (i.e.,, 100 time series
total).
+The data type is *double*, encoding type is *GORILLA*
-IoTDB is v0.9.0.
+The IoTDB version is v0.11.1.
**Write performance**:
-We write 112GB data totally.
-
-The write throughput (points/second) is:
+* batch size:
-
-<span id = "exp1"> <center>Figure 1. Write throughput (points/second) IoTDB
v0.9</center></span>
+10 clients write data concurrently.
+IoTDB uses batch insertion API and the batch size is distributed from 1ms to
1min (write N data points per write API call).
+The write throughput (points/second) is:
-The disk occupation is:
+
+<span id = "exp1"> <center>Figure 1. Batch Size with Write throughput
(points/second) IoTDB v0.11.1</center></span>
-
-<center>Figure 2. Disk occupation(GB) IoTDB v0.9</center>
-**Query performance**
+The write delay (ms) is:
-
-<center>Figure 3. Aggregation query time cost(ms) IoTDB v0.9</center>
+
+<center>Figure 2. Batch Size with Write Delay (ms) IoTDB v0.11.1</center>
-We can see that IoTDB outperforms others.
+* client num:
+The client num is distributed from 1 to 50.
+IoTDB uses batch insertion API and the batch size is 100 (write 100 data
points per write API call).
-#### More details
-
-We provide a benchmarking tool, called IoTDB-benchamrk
(https://github.com/thulab/iotdb-benchmark, you may have to use the dev branch
to compile it),
-it supports IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB. We have a
[article](https://arxiv.org/abs/1901.08304) for comparing these systems using
the benchmark tool.
-When we publish the article, IoTDB just entered Apache incubator, so we
deleted the performance of IoTDB in that article. But after comparison, some
results are presented here.
-
-- **IoTDB: 0.8.0**. (notice: **IoTDB v0.9 outperforms than v0.8**, the result
will be updated once experiments on v0.9 are finished)
-- InfluxDB: 1.5.1.
-- OpenTSDB: 2.3.1 (HBase 1.2.8)
-- KairosDB: 1.2.1 (Cassandra 3.11.3)
-- TimescaleDB: 1.0.0 (PostgreSQL 10.5)
+The write throughput (points/second) is:
-All TSDB run on the same server one by one.
+
+<center>Figure 3. Client Num with Write Throughput (points/second) IoTDB
v0.11.1</center>
-- For InfluxDB, we set the cache-max-memory-size and max-series-perbase as
unlimited (otherwise it will be timeout quickly)
+**Query performance**
-- For OpenTSDB, we modified tsd.http.request.enable_chunked,
tsd.http.request.max_chunk and tsd.storage.fix_duplicates for supporting write
data in batch
-and write out-of-order data.
+
+<center>Figure 4. Raw data query 1 col time cost(ms) IoTDB v0.11.1</center>
-- For KairosDB, we set Cassandra's read_repair_chance as 0.1 (However it has
no effect because we just have one node).
+
+<center>Figure 6. Aggregation query time cost(ms) IoTDB v0.11.1</center>
-- For TimescaleDB, we use PGTune tool to optimize PostgreSQL.
+
+<center>Figure 7. Downsampling query time cost(ms) IoTDB v0.11.1</center>
-All TSDBs run on a server with Intel Xeon CPU E5-2697 v4 @2.3GHz, 256GB memory
and 10 HDD disks with RAID-5.
-The OS is Ubuntu 16.04.2 LTS, 64bits.
+
+<center>Figure 8. Latest query time cost(ms) IoTDB v0.11.1</center>
-Another server run IoTDB benchmark tool.
+We can see that IoTDB outperforms others.
-I omit the detailed workload here, let's see the result:
-Legend:
-- I: InfluxDB
-- O: OpenTSDB
-- T: TimescaleDB
-- K: KairosDB
-- **D: IoTDB**
+#### More details
-
-<span id = "exp4"><center>Figure 4. Write experiments IoTDB
v0.8.0</center></span>
+We provide a benchmarking tool, called IoTDB-benchamrk
(https://github.com/thulab/iotdb-benchmark, you may have to use the dev branch
to compile it),
+it supports IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB. We have an
[article](https://arxiv.org/abs/1901.08304) for comparing these systems using
the benchmark tool.
+When we publish the article, IoTDB just entered Apache incubator, so we
deleted the performance of IoTDB in that article. But after comparison, some
results are presented here.
-
-<center>Figure 5. Query experiments IoTDB v0.8.0</center>
-We can see that IoTDB outperforms others hugely.
+- For InfluxDB, we set the cache-max-memory-size and the max-series-perbase as
unlimited (otherwise it will be timeout quickly).
-In [Figure. 4(c)](#exp4), when the batch size reaches to 10000 points,
InfluxDB is better than IoTDB v0.8.
-It is because in IoTDB v0.8, batch insert API is not optimized.
-
-From IoTDB v0.9 on, using batch insert API can obtain 8 to 10 times write
performance improvement.
+- For KairosDB, we set Cassandra's read_repair_chance as 0.1 (However it has
no effect because we just have one node).
+- For TimescaleDB, we use PGTune tool to optimize PostgreSQL.
-For example, using IoTDB v0.8, the write throughput can only reach to 6
million data points per second.
-But using IoTDB v0.9, the write throughput can reach to 40 million data points
per second on the same server with the same workload.
-(see [Figure. 4(a)](#exp4) vs [Figure. 1](#exp1)).
+All TSDBs run on a server with Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz,(8
cores 16 threads), 32GB memory , 256G SSD and 10T HDD.
+The OS is Ubuntu 16.04.7 LTS, 64bits.
+All clients run on a server with Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6
cores 12 threads), 16GB memory , 256G SSD.
+The OS is Ubuntu 16.04.7 LTS, 64bits.
## Conclusion
+From all above experiments, we can see that IoTDB outperforms others hugely.
+IoTDB has the minimal write latency. The larger the batch size, the higher the
write throughput of IoTDB. This indicates that IoTDB is most suitable for batch
data writing scenarios.
+In high concurrency scenarios, IoTDB can also maintain a steady growth in
throughput. (12 million points per second may have reached the limit of gigabit
network card)
+In raw data query, as the query scope increases, the advantages of IoTDB begin
to manifest. Because the granularity of data blocks is larger and the
advantages of columnar storage are used, column-based compression and columnar
iterators will both accelerate the query.
+In aggregation query, we use the statistics of the file layer and cache the
statistics. Therefore, multiple queries only need to perform memory
calculations (do not need to traverse the original data points, and do not need
to access the disk), so the aggregation performance advantage is obvious.
+Downsampling query scenarios is more interesting, as the time partition
becomes larger and larger, the query performance of IoTDB increases gradually.
Probably it has risen twice, which corresponds to the pre-calculated
information of 2 granularities(3 hours and 4.5 days). Therefore, the queries in
the range of 1 day and 1 week are accelerated respectively. The other databases
only rose once, indicating that they only have one granular statistics.
+
If you are considering a TSDB for your IIoT application, Apache IoTDB, a new
time series, is your best choice.
We will update this page once we release new version and finish the
experiments.