Hi, The TsFile proposal is as follows, feel free to give advice :-)
Abstract TsFile is a columnar storage file format designed for time series data, which supports efficient compression, high throughput of read and write, and compatibility with various frameworks, such as Spark and Flink. It is easy to integrate TsFile into IoT big data processing frameworks. Proposal TsFile is used for managing time series data. Although it is firstly used inside IoTDB, many users and companies use TsFile directly as a time series data management solution independently. Besides, there is a growing requirement for Multi-Language Support of TsFile implementations, such as C++, Go and Rust. Apache IoTDB community hereby submit this proposal of TsFile as an independent Apache project. This proposal outlines the key features and benefits of TsFile, along with the integration plan and the need for multi-language support. Background Time series data is becoming increasingly important in a wide range of applications, including IoT, intelligent control, finance, log analysis, and monitoring systems. TsFile has been developed by Apache IoTDB community in Java and is in IoTDB repository. Users could store time series data using TsFile, then read and analyze it in IoTDB, Spark and Flink. IoTDB could also generate TsFiles and syncronize TsFiles between two IoTDB instances. Furthermore, the demand for TsFile implementations in multiple programming languages has been growing, as it allows developers to leverage TsFile's capabilities in their preferred language. TsFile offers several distinctive features and benefits: Efficient Storage and Compression: TsFile employs advanced compression techniques to minimize storage requirements, resulting in reduced disk space consumption and improved system efficiency. Flexible Schema and Metadata Management: TsFile allows for directly write data without pre defining the schema, which is flexible for data aquisition. High Query Performance with time range: TsFile has indexed devices, sensors and time dimensions to accelerate query performance, enabling fast filtering and retrieval of time series data. Seamless Integration: TsFile is designed to seamlessly integrate with existing big data frameworks, such as Spark, Flink and Hadoop. Rationale Before using TsFile, there doesn't exist a file format for time series. The industry companies usually write time series data in various user-defined file format without unification, or use general columnar file format such as Parquet and ORC, which makes data collection and processing complicated without a standard. After TsFile, organizations could write data in TsFile inside end devices or gateway, then transfer TsFile to the cloud for unified management in IoTDB and other systems. In this way, we lower the network transmission and the computing resource consumption in the cloud. Initial Goals The initial goals include: Make TsFile an independent project that has its own SDK and documentation that is easier to use. Multi-Language Support of TsFile implementations apart from Java, such as C++, Go and Rust. Integrate more encoding and compression method in TsFile. More tools for TsFile: visualization tool, parsing tool, repair tool. Current Status Meritocracy We plan to invite the IoTDB committer to be the initial committer of TsFile. And we would like to follow ASF meritocratic principles and invite additional developers to participate. We will establish the documentation and encourage and monitor community participation so that privileges can be extended to those that contribute. Community The TsFile community is grown from the Apache IoTDB Community. The IoTDB community is introducing TsFile at many technical conferences. Next, we will build the mailing list for more convenience, broader communication and archived discussions. We are open to recruiting more developers from diverse backgrounds. Core Developers TsFile initial PMCs are from IoTDB community: Christofer Dutz, Xiangdong Huang, Jialin Qiao, Steve Yurong Su, Jinrui Zhang, Yuan Tian, Xinyu Tan, Haonan Hou, Gaofei Cao, Tian Jiang, Chao Wang(wangchao316), Chao Wang(mychaow), Houliang Qi, Kun Liu. These people has extensive experience in building database and data management system. Alignment The ASF is the natural choice to host the TsFile project as its goal of encouraging community-driven open-source projects fits with our vision for TsFile. Additionally, many other projects with which we are familiar with and expect TsFile to integrate with, such as Apache Spark, Apache Flink and Apache IoTDB are hosted by the ASF and we will benefit and provide benefits in close proximity to them. Known Risks Project Name TsFile project is used in IoTDB and other scenarios for over 7 years, its name is unique. Orphaned Products The core developers plan to work full time on the project. There is very little risk of TsFile being abandoned as it is part of Apache IoTDB's internal infrastructure. Tsinghua and NEL-BDS Lab relies on TsFile as a platform for a large number of long-term research projects. Companies such as Timcho, Huawei, BONC, Yonyou will also participated in this project. Inexperience with Open Source All of the core developers have experience with open source development. We have 1 Apache Board member(Christofer Dutz), 2 Apache Members(Xiangdong Huang, Jialin Qiao) and 11 Apache IoTDB PMCs in TsFile initial PMCs. We have the experience of the Apache way, such as community over code, license management, version release, CEV processing, attract committer/pmcs and meetups. Length of Incubation There are enough initial PMCs of TsFile know the ASF process well, so we apply to go straight to TLP. Homogeneous Developers The current core developers are from diverse groups: Apache IoTDB Community, Timecho, BONC, Huawei, eBay, Yonyou and Tsinghua University. Reliance on Salaried Developers Currently, the developers are paid to do work at Timecho, BONC, Huawei and Tsinghua University. Also we have students and researchers/professors community in universities, and their researches focus on big data management and analytics. It is unlikely that they will change their research focus away from big data management. We will work to ensure that the ability for the project to continuously be stewarded and to proceed forward independent of salaried developers is continued. Relationships with Other Apache Products TsFile is used by Apache IoTDB project as the default data file format. TsFile-Spark-connector and TsFile-Flink-connector have been developed to support analysing time series data by using Apache Spark and Flink. Overall, TsFile is designed as an open architecture, and it can be integrated with many other systems in the future. An Excessive Fascination with the Apache Brand We respect the reputation of the Apache brand and have no doubt that it will attract contributors and users. Most of the initial developers come from Apache IoTDB Community, so we have no intent to use the Apache brand for profit. Our goal is making TsFile integrating more with Apache projects and letting Apache community more professional in IoT data management. Documentation The Documentations: https://iotdb.apache.org/UserGuide/V1.2.x/API/Programming-TsFile-API.html Examples: https://github.com/apache/iotdb/tree/master/example/tsfile Initial Source TsFile core: https://github.com/apache/iotdb/tree/master/iotdb-core/tsfile Spark TsFile Connector:https://github.com/apache/iotdb/blob/master/iotdb-connector/spark-tsfile Flink TsFile Connector: https://github.com/apache/iotdb/blob/master/iotdb-connector/flink-tsfile-connector Source and Intellectual Property Submission Plan External Dependencies zstd-jni: BSD 2-Clause License logback-classic: Eclipse Public License - v 1.0 snappy-java: Apache License 2.0 commons-io: Apache License 2.0 commons-lang3: Apache License 2.0 lz4-java: Apache License 2.0 gson: Apache License 2.0 slice: Apache License 2.0 xz: Public Domain Mock: Apache License 2.0 Cryptography All codes are public under Apache license 2.0. Required Resources Mailing Lists priv...@tsfile.apache.org d...@tsfile.apache.org comm...@tsfile.apache.org Subversion Directory Git is the preferred source control system: git://git.apache.org/tsfile Issue Tracking JIRA TsFile (TsFile) Initial Committers Yuan Tian (jackietien at apache dot org) Chao Wang (wangchao316 at apache dot org) Christofer Dutz (cdutz at apache dot org) Jinrui Zhang (xingtanzjr at apache dot org) Steve Yurong Su (rong at apache dot org) Xinyu Tan (tanxinyu at apache dot org) Haohan Hou (haonan at apache dot org) Gaofei Cao (gaogaofei at apache dot org) Jialin Qiao (qiaojialin at apache dot org) Kun Liu (liukun at apache dot org) Houliang Qi (neuyilan at apache dot org) Xiangdong Huang (hxd at apache dot org) Chao Wang (chaow at apache dot org) Jianmin Wang (jimwang at apache dot org) Tian Jiang (jiangtian at apache dot org) Xinyi Zhao (zhaoxinyi at apache dot org) Shuo Zhang (shuozhagn at apache dot org) Sponsors Champion:TBD Nominated Mentors:TBD Thanks, ————————————————— Jialin Qiao Apache IoTDB PMC