Hi,

The TsFile proposal is as follows, feel free to give advice :-)

Abstract

TsFile is a columnar storage file format designed for time series
data, which supports efficient compression, high throughput of read
and write, and compatibility with various frameworks, such as Spark
and Flink. It is easy to integrate TsFile into IoT big data processing
frameworks.

Proposal

TsFile is used for managing time series data. Although it is firstly
used inside IoTDB, many users and companies use TsFile directly as a
time series data management solution independently. Besides, there is
a growing requirement for Multi-Language Support of TsFile
implementations, such as C++, Go and Rust.

Apache IoTDB community hereby submit this proposal of TsFile as an
independent Apache project. This proposal outlines the key features
and benefits of TsFile, along with the integration plan and the need
for multi-language support.

Background

Time series data is becoming increasingly important in a wide range of
applications, including IoT, intelligent control, finance, log
analysis, and monitoring systems.
TsFile has been developed by Apache IoTDB community in Java and is in
IoTDB repository. Users could store time series data using TsFile,
then read and analyze it in IoTDB, Spark and Flink. IoTDB could also
generate TsFiles and syncronize TsFiles between two IoTDB instances.
Furthermore, the demand for TsFile implementations in multiple
programming languages has been growing, as it allows developers to
leverage TsFile's capabilities in their preferred language.

TsFile offers several distinctive features and benefits:
Efficient Storage and Compression: TsFile employs advanced compression
techniques to minimize storage requirements, resulting in reduced disk
space consumption and improved system efficiency.
Flexible Schema and Metadata Management: TsFile allows for directly
write data without pre defining the schema, which is flexible for data
aquisition.
High Query Performance with time range: TsFile has indexed devices,
sensors and time dimensions to accelerate query performance, enabling
fast filtering and retrieval of time series data.
Seamless Integration: TsFile is designed to seamlessly integrate with
existing big data frameworks, such as Spark, Flink and Hadoop.


Rationale

Before using TsFile, there doesn't exist a file format for time
series. The industry companies usually write time series data in
various user-defined file format without unification, or use general
columnar file format such as Parquet and ORC, which makes data
collection and processing complicated without a standard. After
TsFile, organizations could write data in TsFile inside end devices or
gateway, then transfer TsFile to the cloud for unified management in
IoTDB and other systems. In this way, we lower the network
transmission and the computing resource consumption in the cloud.

Initial Goals

The initial goals include:

Make TsFile an independent project that has its own SDK and
documentation that is easier to use.
Multi-Language Support of TsFile implementations apart from Java, such
as C++, Go and Rust.
Integrate more encoding and compression method in TsFile.
More tools for TsFile: visualization tool, parsing tool, repair tool.


Current Status

Meritocracy

We plan to invite the IoTDB committer to be the initial committer of
TsFile. And we would like to follow ASF meritocratic principles and
invite additional developers to participate. We will establish the
documentation and encourage and monitor community participation so
that privileges can be extended to those that contribute.

Community

The TsFile community is grown from the Apache IoTDB Community. The
IoTDB community is introducing TsFile at many technical conferences.
Next, we will build the mailing list for more convenience, broader
communication and archived discussions. We are open to recruiting more
developers from diverse backgrounds.

Core Developers

TsFile initial PMCs are from IoTDB community: Christofer Dutz,
Xiangdong Huang, Jialin Qiao, Steve Yurong Su, Jinrui Zhang, Yuan
Tian, Xinyu Tan, Haonan Hou, Gaofei Cao, Tian Jiang, Chao
Wang(wangchao316), Chao Wang(mychaow), Houliang Qi, Kun Liu.
These people has extensive experience in building database and data
management system.

Alignment

The ASF is the natural choice to host the TsFile project as its goal
of encouraging community-driven open-source projects fits with our
vision for TsFile. Additionally, many other projects with which we are
familiar with and expect TsFile to integrate with, such as Apache
Spark, Apache Flink and Apache IoTDB are hosted by the ASF and we will
benefit and provide benefits in close proximity to them.

Known Risks

Project Name

TsFile project is used in IoTDB and other scenarios for over 7 years,
its name is unique.

Orphaned Products

The core developers plan to work full time on the project. There is
very little risk of TsFile being abandoned as it is part of Apache
IoTDB's internal infrastructure. Tsinghua and NEL-BDS Lab relies on
TsFile as a platform for a large number of long-term research
projects. Companies such as Timcho, Huawei, BONC, Yonyou will also
participated in this project.

Inexperience with Open Source

All of the core developers have experience with open source
development. We have 1 Apache Board member(Christofer Dutz), 2 Apache
Members(Xiangdong Huang, Jialin Qiao) and 11 Apache IoTDB PMCs in
TsFile initial PMCs. We have the experience of the Apache way, such as
community over code, license management, version release, CEV
processing, attract committer/pmcs and meetups.

Length of Incubation

There are enough initial PMCs of TsFile know the ASF process well, so
we apply to go straight to TLP.

Homogeneous Developers

The current core developers are from diverse groups: Apache IoTDB
Community, Timecho, BONC, Huawei, eBay, Yonyou and Tsinghua
University.

Reliance on Salaried Developers

Currently, the developers are paid to do work at Timecho, BONC, Huawei
and Tsinghua University. Also we have students and
researchers/professors community in universities, and their researches
focus on big data management and analytics. It is unlikely that they
will change their research focus away from big data management. We
will work to ensure that the ability for the project to continuously
be stewarded and to proceed forward independent of salaried developers
is continued.

Relationships with Other Apache Products

TsFile is used by Apache IoTDB project as the default data file
format. TsFile-Spark-connector and TsFile-Flink-connector have been
developed to support analysing time series data by using Apache Spark
and Flink.
Overall, TsFile is designed as an open architecture, and it can be
integrated with many other systems in the future.

An Excessive Fascination with the Apache Brand

We respect the reputation of the Apache brand and have no doubt that
it will attract contributors and users. Most of the initial developers
come from Apache IoTDB Community, so we have no intent to use the
Apache brand for profit. Our goal is making TsFile integrating more
with Apache projects and letting Apache community more professional in
IoT data management.

Documentation

The Documentations:
https://iotdb.apache.org/UserGuide/V1.2.x/API/Programming-TsFile-API.html
Examples: https://github.com/apache/iotdb/tree/master/example/tsfile

Initial Source

TsFile core: https://github.com/apache/iotdb/tree/master/iotdb-core/tsfile
Spark TsFile 
Connector:https://github.com/apache/iotdb/blob/master/iotdb-connector/spark-tsfile
Flink TsFile Connector:
https://github.com/apache/iotdb/blob/master/iotdb-connector/flink-tsfile-connector


Source and Intellectual Property Submission Plan

External Dependencies

zstd-jni: BSD 2-Clause License
logback-classic: Eclipse Public License - v 1.0
snappy-java: Apache License 2.0
commons-io: Apache License 2.0
commons-lang3: Apache License 2.0
lz4-java: Apache License 2.0
gson: Apache License 2.0
slice: Apache License 2.0
xz: Public Domain
Mock: Apache License 2.0


Cryptography

All codes are public under Apache license 2.0.

Required Resources


Mailing Lists

priv...@tsfile.apache.org
d...@tsfile.apache.org
comm...@tsfile.apache.org

Subversion Directory

Git is the preferred source control system: git://git.apache.org/tsfile

Issue Tracking

JIRA TsFile (TsFile)

Initial Committers

Yuan Tian (jackietien at apache dot org)
Chao Wang (wangchao316 at apache dot org)
Christofer Dutz (cdutz at apache dot org)
Jinrui Zhang (xingtanzjr at apache dot org)
Steve Yurong Su (rong at apache dot org)
Xinyu Tan (tanxinyu at apache dot org)
Haohan Hou (haonan at apache dot org)
Gaofei Cao (gaogaofei at apache dot org)
Jialin Qiao (qiaojialin at apache dot org)
Kun Liu (liukun at apache dot org)
Houliang Qi (neuyilan at apache dot org)
Xiangdong Huang (hxd at apache dot org)
Chao Wang (chaow at apache dot org)
Jianmin Wang (jimwang at apache dot org)
Tian Jiang (jiangtian at apache dot org)
Xinyi Zhao (zhaoxinyi at apache dot org)
Shuo Zhang (shuozhagn at apache dot org)

Sponsors
Champion:TBD
Nominated Mentors:TBD


Thanks,
—————————————————
Jialin Qiao
Apache IoTDB PMC

Reply via email to