[iotdb] branch master updated: Update tsdb comparison doc (#2601)

haonan Fri, 19 Feb 2021 17:44:45 -0800

This is an automated email from the ASF dual-hosted git repository.

haonan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iotdb.git



The following commit(s) were added to refs/heads/master by this push:
     new 8289372  Update tsdb comparison doc (#2601)
8289372 is described below

commit 8289372279ef9e14763b8bb40bb81ba7e1bdac3b
Author: zhanglingzhe0820 <[email protected]>
AuthorDate: Sat Feb 20 09:44:19 2021 +0800

    Update tsdb comparison doc (#2601)
    
    * update TSDB-Comparison doc
    
    Co-authored-by: zhanglingzhe <[email protected]>
    Co-authored-by: Haonan <[email protected]>
---
 docs/UserGuide/Comparison/TSDB-Comparison.md | 107 ++++++++++++---------------
 1 file changed, 48 insertions(+), 59 deletions(-)

diff --git a/docs/UserGuide/Comparison/TSDB-Comparison.md 
b/docs/UserGuide/Comparison/TSDB-Comparison.md
index cb9c23e..fcd1826 100644
--- a/docs/UserGuide/Comparison/TSDB-Comparison.md
+++ b/docs/UserGuide/Comparison/TSDB-Comparison.md
@@ -43,7 +43,7 @@ However, few of them are developed for IoT or IIoT 
(Industrial IoT) scenario in
   
   Interface: Restful API
 
-* TimesacleDB - Time series database based on Relational Database
+* TimescaleDB - Time series database based on Relational Database
 
   Interface: SQL
 
@@ -284,101 +284,90 @@ It is somehow right. But, if you consider the 
performance, you may change your m
 
 #### quick review
 
-Given a workload:
-
 * Write:
 
-10 clients write data concurrently. The number of storage group is 50. There 
are 1000 devices and each device has 100 measurements (i.e.,, 100K time series 
totally).
-The data type is float and IoTDB uses RLE encoding and Snappy compression. 
-IoTDB uses batch insertion API and the batch size is 100 (write 100 data 
points per write API call).
+We test the performance of writing from two aspects: *batch size* and *client 
num*. The number of storage group is 10. There are 1000 devices and each device 
has 100 measurements(i.e.,, 100K time series total).
 
 * Read:
 
-50 clients read data concurrently. Each client just read data from 1 device 
with 10 measurements in one storage group.
+10 clients read data concurrently. The number of storage group is 10. There 
are 10 devices and each device has 10 measurements (i.e.,, 100 time series 
total).
+The data type is *double*, encoding type is *GORILLA*
 
-IoTDB is v0.9.0.
+The IoTDB version is v0.11.1.
 
 **Write performance**:
 
-We write 112GB data totally.
-
-The write throughput (points/second) is:
+* batch size:
 
-![Write Throughput 
(points/second)](https://user-images.githubusercontent.com/1021782/80472896-f1db0e00-8977-11ea-9424-96bf0021588d.png)
-<span id = "exp1"> <center>Figure 1. Write throughput (points/second) IoTDB 
v0.9</center></span>
+10 clients write data concurrently.
+IoTDB uses batch insertion API and the batch size is distributed from 1ms to 
1min (write N data points per write API call).
 
+The write throughput (points/second) is:
 
-The disk occupation is:
+![Batch Size with Write Throughput 
(points/second)](https://user-images.githubusercontent.com/24886743/106254214-6cacbe80-6253-11eb-8532-d6a1829f8f66.png)
+<span id = "exp1"> <center>Figure 1. Batch Size with Write throughput 
(points/second) IoTDB v0.11.1</center></span>
 
-![Disk 
Occupation](https://user-images.githubusercontent.com/1021782/80472899-f3a4d180-8977-11ea-8233-268ad4e3713e.png)
-<center>Figure 2. Disk occupation(GB) IoTDB v0.9</center>
 
-**Query performance**
+The write delay (ms) is:
 
-![Aggregation 
query](https://user-images.githubusercontent.com/1021782/80472924-fef7fd00-8977-11ea-9ad4-b4d3c899605e.png)
-<center>Figure 3. Aggregation query time cost(ms) IoTDB v0.9</center>
+![Batch Size with Write Delay 
(ms)](https://user-images.githubusercontent.com/24886743/106251391-df1b9f80-624f-11eb-9f1f-66823839acba.png)
+<center>Figure 2. Batch Size with Write Delay (ms) IoTDB v0.11.1</center>
 
-We can see that IoTDB outperforms others. 
+* client num:
 
+The client num is distributed from 1 to 50.
+IoTDB uses batch insertion API and the batch size is 100 (write 100 data 
points per write API call).
 
-#### More details
-
-We provide a benchmarking tool, called IoTDB-benchamrk 
(https://github.com/thulab/iotdb-benchmark, you may have to use the dev branch 
to compile it),
-it supports IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB. We have a 
[article](https://arxiv.org/abs/1901.08304) for comparing these systems using 
the benchmark tool.
-When we publish the article, IoTDB just entered Apache incubator, so we 
deleted the performance of IoTDB in that article. But after comparison, some 
results are presented here.
-
-- **IoTDB: 0.8.0**. (notice: **IoTDB v0.9 outperforms than v0.8**, the result 
will be updated once experiments on v0.9 are finished)
-- InfluxDB: 1.5.1.
-- OpenTSDB: 2.3.1 (HBase 1.2.8)
-- KairosDB: 1.2.1 (Cassandra 3.11.3)
-- TimescaleDB: 1.0.0 (PostgreSQL 10.5)
+The write throughput (points/second) is:
 
-All TSDB run on the same server one by one. 
+![Client Num with Write Throughput (points/second) 
(ms)](https://user-images.githubusercontent.com/24886743/106251411-e5aa1700-624f-11eb-8ca8-00c0627b1e96.png)
+<center>Figure 3. Client Num with Write Throughput (points/second) IoTDB 
v0.11.1</center>
 
-- For InfluxDB, we set the cache-max-memory-size  and max-series-perbase as 
unlimited (otherwise it will be timeout quickly)
+**Query performance**
 
-- For OpenTSDB, we modified tsd.http.request.enable_chunked, 
tsd.http.request.max_chunk and tsd.storage.fix_duplicates for supporting write 
data in batch
-and write out-of-order data.
+![Raw data query 1 
col](https://user-images.githubusercontent.com/24886743/106251377-daef8200-624f-11eb-9678-b1d5440be2de.png)
+<center>Figure 4. Raw data query 1 col time cost(ms) IoTDB v0.11.1</center>
 
-- For KairosDB, we set Cassandra's read_repair_chance as 0.1 (However it has 
no effect because we just have one node).
+![Aggregation 
query](https://user-images.githubusercontent.com/24886743/106251336-cf03c000-624f-11eb-8395-de5e349f47b5.png)
+<center>Figure 6. Aggregation query time cost(ms) IoTDB v0.11.1</center>
 
-- For TimescaleDB, we use PGTune tool to optimize PostgreSQL.
+![Downsampling 
query](Query.pnghttps://user-images.githubusercontent.com/24886743/106251353-d32fdd80-624f-11eb-80c1-fdb4197939fe.png)
+<center>Figure 7. Downsampling query time cost(ms) IoTDB v0.11.1</center>
 
-All TSDBs run on a server with Intel Xeon CPU E5-2697 v4 @2.3GHz, 256GB memory 
and 10 HDD disks with RAID-5.
-The OS is Ubuntu 16.04.2 LTS, 64bits.
+![Latest 
query](https://user-images.githubusercontent.com/24886743/106251369-d7f49180-624f-11eb-9d19-fc7341582b90.png)
+<center>Figure 8. Latest query time cost(ms) IoTDB v0.11.1</center>
 
-Another server run IoTDB benchmark tool.
+We can see that IoTDB outperforms others. 
 
-I omit the detailed workload here, let's see the result:
 
-Legend: 
-- I: InfluxDB
-- O: OpenTSDB
-- T: TimescaleDB
-- K: KairosDB
-- **D: IoTDB**
+#### More details
 
-![Write 
experiments](https://user-images.githubusercontent.com/1021782/80476160-95c6b880-897c-11ea-9bb3-9d810cc0c79e.png)
-<span id = "exp4"><center>Figure 4. Write experiments IoTDB 
v0.8.0</center></span>
+We provide a benchmarking tool, called IoTDB-benchamrk 
(https://github.com/thulab/iotdb-benchmark, you may have to use the dev branch 
to compile it),
+it supports IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB. We have an 
[article](https://arxiv.org/abs/1901.08304) for comparing these systems using 
the benchmark tool.
+When we publish the article, IoTDB just entered Apache incubator, so we 
deleted the performance of IoTDB in that article. But after comparison, some 
results are presented here.
 
-![Query 
experiments](https://user-images.githubusercontent.com/1021782/80476181-9c553000-897c-11ea-8170-4768134f5841.png)
-<center>Figure 5. Query experiments IoTDB v0.8.0</center>
 
-We can see that IoTDB outperforms others hugely.
+- For InfluxDB, we set the cache-max-memory-size and the max-series-perbase as 
unlimited (otherwise it will be timeout quickly).
 
-In [Figure. 4(c)](#exp4), when the batch size reaches to 10000 points, 
InfluxDB is better than IoTDB v0.8.
-It is because in IoTDB v0.8, batch insert API is not optimized.
- 
-From IoTDB v0.9 on, using batch insert API can obtain 8 to 10 times write 
performance improvement. 
+- For KairosDB, we set Cassandra's read_repair_chance as 0.1 (However it has 
no effect because we just have one node).
 
+- For TimescaleDB, we use PGTune tool to optimize PostgreSQL.
 
-For example, using IoTDB v0.8, the write throughput can only reach to 6 
million data points per second. 
-But using IoTDB v0.9, the write throughput can reach to 40 million data points 
per second on the same server with the same workload.
-(see [Figure. 4(a)](#exp4) vs [Figure. 1](#exp1)).
+All TSDBs run on a server with Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz，(8 
cores 16 threads), 32GB memory , 256G SSD and 10T HDD.
+The OS is Ubuntu 16.04.7 LTS, 64bits.
 
+All clients run on a server with Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz，(6 
cores 12 threads), 16GB memory , 256G SSD.
+The OS is Ubuntu 16.04.7 LTS, 64bits.
 
 ## Conclusion
 
+From all above experiments, we can see that IoTDB outperforms others hugely.
+IoTDB has the minimal write latency. The larger the batch size, the higher the 
write throughput of IoTDB. This indicates that IoTDB is most suitable for batch 
data writing scenarios.
+In high concurrency scenarios, IoTDB can also maintain a steady growth in 
throughput. (12 million points per second may have reached the limit of gigabit 
network card)
+In raw data query, as the query scope increases, the advantages of IoTDB begin 
to manifest. Because the granularity of data blocks is larger and the 
advantages of columnar storage are used, column-based compression and columnar 
iterators will both accelerate the query.
+In aggregation query, we use the statistics of the file layer and cache the 
statistics. Therefore, multiple queries only need to perform memory 
calculations (do not need to traverse the original data points, and do not need 
to access the disk), so the aggregation performance advantage is obvious.
+Downsampling query scenarios is more interesting, as the time partition 
becomes larger and larger, the query performance of IoTDB increases gradually. 
Probably it has risen twice, which corresponds to the pre-calculated 
information of 2 granularities(3 hours and 4.5 days). Therefore, the queries in 
the range of 1 day and 1 week are accelerated respectively. The other databases 
only rose once, indicating that they only have one granular statistics.
+
 If you are considering a TSDB for your IIoT application, Apache IoTDB, a new 
time series, is your best choice.
 
 We will update this page once we release new version and finish the 
experiments.

[iotdb] branch master updated: Update tsdb comparison doc (#2601)

Reply via email to