[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-22 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637518#comment-17637518
 ] 

Xingyu Liu commented on IOTDB-4473:
---

文档已补全
开发设计文档:https://apache-iotdb.feishu.cn/docx/Q4TydTObJofXTTxuF1Icuq4Knlh
用户使用手册:https://apache-iotdb.feishu.cn/docx/EQjFdAdL9oBvYqx0eKrcH0RYn6b

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png, 
> image-2022-11-17-22-05-53-013.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-17 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635585#comment-17635585
 ] 

Xingyu Liu commented on IOTDB-4473:
---

新增功能:现在用户可以将数据集导出到CSV格式的文件中。CSV是以时间序列对齐的,格式可以参考:https://iotdb.apache.org/zh/UserGuide/Master/Write-Data/CSV-Tool.html#%E4%BD%BF%E7%94%A8import-csv-sh

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png, 
> image-2022-11-17-22-05-53-013.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-17 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17635368#comment-17635368
 ] 

Xingyu Liu commented on IOTDB-4473:
---

做了一个小实验来证明对tsbs插入方式、tags处理方式的改动是能保证性能提升的。
旧的处理方式:将tags作为fields处理
新的方式:对每个存储组,新增{{{}_tags{}}}节点,在attributes中记录信息,且仅插入一次后续不更新
!image-2022-11-17-22-05-53-013.png!
更详细的内容可以查看:https://apache-iotdb.feishu.cn/docx/Q4TydTObJofXTTxuF1Icuq4Knlh#HaaId8weUomuUsxUHr3cVyqKnZb

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png, 
> image-2022-11-17-22-05-53-013.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-16 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17634921#comment-17634921
 ] 

Xingyu Liu commented on IOTDB-4473:
---

目前有两项优化工作:

1. 数据插入方式优化为使用 
InsertRecords方式插入。不使用tablet的原因为同一batch下相同节点的数据可能很少,需要插入大量只有几行的tablet,效率低。

2. tags处理方式改为使用一个不记录数据的节点 _tags 利用 attributes 
记录tags信息,此处的tags指的是tsbs项目内的称呼。这里不使用iotdb中tags结构的原因是,attributes不会建立倒排索引更节省资源,并且tsbs测试中,tags的值始终不变,也与查询无关。

上述两项优化,应该都能提升插入测试性能,目前前者已完成,后者正在测试调整。

后续会实际测出优化的效果。

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-10 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17632005#comment-17632005
 ] 

Xingyu Liu commented on IOTDB-4473:
---

user handbook preview in English: [tsbs/iotdb.md at iotdb · 
citrusreticulata/tsbs 
(github.com)|https://github.com/citrusreticulata/tsbs/blob/iotdb/docs/iotdb.md]

考虑到还需要阅读tsbs的总README,中文用户手册会包含更详细的内容,除了IoTDB部分还会包含tsbs的使用,正在编写中

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-09 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631368#comment-17631368
 ] 

Xingyu Liu commented on IOTDB-4473:
---

developer documentations preview(中文开发者文档预览版): 
[https://apache-iotdb.feishu.cn/docx/Q4TydTObJofXTTxuF1Icuq4Knlh]

 

 

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-08 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630399#comment-17630399
 ] 

Xingyu Liu commented on IOTDB-4473:
---

tsbs_run_queries_iotdb has been completed.

The whole project has been completed except the documents. 

插入测试部分已经完成。

整个项目除了文档部分外,已全部完成。

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-06 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629455#comment-17629455
 ] 

Xingyu Liu commented on IOTDB-4473:
---

单次测试下不同workers对应插入效率

!image-2022-11-06-19-37-51-033.png!

注意事项:
 # 未多次测试取平均值,仅为单次测试参考!
 # 测试环境:    CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz,基准速度:    2.59 GHz,内核: 
   6,逻辑处理器:    12,虚拟化:    已启用,L1 缓存:    384 KB,L2 缓存:    1.5 MB,L3 缓存:    12.0 
MB
RAM:16GB
 # 从SSD硬盘读取与从HDD硬盘读取,效率差别不大。大约在workers超过10时,效率达到较高水平,后续再增加worker效率提升减缓

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
> Attachments: image-2022-11-06-19-37-51-033.png
>
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-03 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628420#comment-17628420
 ] 

Xingyu Liu commented on IOTDB-4473:
---

2022 Nov 04
h1. 关于batch

跟进了之前的问题,batch的自定义的确会影响性能。我之前以为是使用go原始的数组影响了插入性能,导致插入速度只有4000~4200rows/sec,于是试着模仿其他数据库中利用byte.Buffer的实现。结果没想到在某些情况下更慢了,更精简的“优化”反而带来负提升。

于是我转而关注iotdb-client-go的性能,发现什么预处理都不做,直接插入相同规模的数据,在方法不变(使用执行SQL语句的方式插入)的情况下,写入性能为4500rows/sec。相比之下tsbs中现有实现的方式仅损耗了6%~10%左右的性能,这些时间可能被tsbs用于处理datasource等。

所以这样看来目前对batch的实现,没有本质上的性能缺陷。

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-03 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628309#comment-17628309
 ] 

Xingyu Liu commented on IOTDB-4473:
---

2022 Nov 03

尚存在一些需要注意的问题,查明/解决这些问题有可能有助于提高测试性能。问题如下:
 # 写入测试中,自定义的batch是否会影响写入测试的性能?
 # 
写入测试中,Processor结构在初始化变量时允许传入参数WorkerNum,是否需要使用它?使用它能否提高效率?绝大多数支持的数据库,例如timescaledb,questdb,createdb,mongodb都直接忽略了该参数,influxDB支持了该参数。
 # 写入测试中,使用了SQL语句进行插入,而go客户端支持InsertRecords,可以考察使用该方法是否有助于提高写入性能。(应该可以优化性能)

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (IOTDB-4473) Support TSBS Timeseries Benchmark

2022-11-03 Thread Xingyu Liu (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628305#comment-17628305
 ] 

Xingyu Liu commented on IOTDB-4473:
---

2022 Nov 03

主要的工作分为四部分。
1. 数据生成
2. 查询生成
3. 写入测试(加载测试样例)
4. 查询测试

目前已经完成1,2,3三部分。

The main work is divided into four parts.

1. Data generation

2. Query Generation

3. Insert test (load test cases)

4. Query test

Parts 1,2,3 have been completed till now.

> Support TSBS Timeseries Benchmark
> -
>
> Key: IOTDB-4473
> URL: https://issues.apache.org/jira/browse/IOTDB-4473
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Julian Feinauer
>Assignee: Xingyu Liu
>Priority: Major
>
> The tsbs Benchmark was initially created by InfluxDB and is now maintained by 
> timescale db and compares many state of the art timeseries databases: 
> [https://github.com/timescale/tsbs]
>  
> IMHO we should add support for IoTDB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)