Hi Yuchen,

do I understand it correctly, that with this, a client could subscribe to data 
and receive this in TsFile format? This plus the ability of an IoTDB instance 
subscribing to such a TsFile subscription would perfectly align with my plans 
for 2024 allowing TsFile libraries on PLCs sending data to IoTDB as well as 
simplify data-collection gateways.

Chris

Von: Yuchen Ding <vgalax...@apache.org>
Datum: Freitag, 31. Mai 2024 um 10:14
An: dev@iotdb.apache.org <dev@iotdb.apache.org>
Betreff: Re: [Proposal] Data Subscription Client on IoTDB
Hello everyone,

I am VGalaxies. Recently, I have been working on providing TsFile subscription 
support for IoTDB data subscription client. The background of this feature is 
to achieve data file export and backup for multi-replica clusters using data 
subscription. Using the existing data subscription client, the subscribed data 
is in the form of SessionDataSet. The server needs to parse TsFile, and the 
client needs to rewrite TsFile. Raw data is transmitted over the network, 
without leveraging the high compression features of TsFile. Therefore, we hope 
to support exporting TsFile using the TsFile client SDK in data subscriptions.

In terms of functionality, TsFile subscription support for the data 
subscription client includes three steps. First, create a topic with a data 
presentation format of TsFile, by specifying the topic format as TsFileHandler. 
Second, when creating a consumer, specify the directory where the subscribed 
TsFile will be saved using the fileSaveDir parameter. Third, obtain the 
corresponding handler based on the type of SubscriptionMessage, which is 
SubscriptionTsFileHandler. SubscriptionTsFileHandler encapsulates operations 
like cp, mv, and rm from the Java standard library, and can also iterate data 
through the TsFile SDK, i.e., TsFileReader. Users can achieve TsFile 
subscription with minimal configuration.

Technically, we have restructured the Message Payload of the pipeSubscribe RPC 
poll type to support the transmission of data in both SessionDataSet and TsFile 
formats. The transmission of a TsFile file is divided into multiple events, 
including tsfile init event, tsfile piece event, and tsfile seal event. The 
reliable transmission of TsFile files is achieved through the interaction of 
these events between the DN side and the client side.

Currently, TsFile subscription still has some limitations. For example, when 
the format of the topic is TsFileHandler, there are certain constraints on its 
path and time configuration. We will continue to optimize these issues in 
future iterations.

I have initially implemented support for TsFile subscription in this PR[1]. I 
hope you are interested in this feature and would like to participate in the 
development and testing. You can also leave your comments and suggestions in 
this thread. Appreciate any suggestion/feedback & contribution.

Thank you for your attention and support.

Best regards,
VGalaxies

Reference:
1. https://github.com/apache/iotdb/pull/12326

On 2024/04/08 03:06:59 VGalaxies wrote:
> Hello everyone,
>
> I am VGalaxies, a new contributor to Apache IoTDB. I am excited to
> share with you a new feature that I have been working on for the past
> few months.
>
> The data subscription client is a new way to access data within IoTDB,
> distinct from the traditional method of querying data using SQL-like
> syntax. In scenarios where real-time data, quick response to data
> changes, and building highly event-driven systems are required, data
> subscription has greater advantages over data querying. For example,
> in the following two scenarios:
>
> 1. Replace extensive polling queries for large amounts of data: Avoid
> significant impacts on the performance of existing systems when
> querying frequently or when there are many data points. Also, avoid
> problems with determining the query scope and ensure downstream
> receives accurate full data.
> 2. Facilitate downstream system integration: It's easier to integrate
> with components such as Flink, Spark, Kafka/DataX, Camel/MySQL, etc.
> There's no need to customize the logic of IoTDB's data change capture
> for each big data component separately, simplifying integration
> component design and making it easier for users.
>
> The IoTDB subscription client references some features defined by some
> message queue products like Kafka. It consists of 3 core concepts:
> Topic, Consumer, and Consumer Group.
>
> - Topic is a logical concept used by the IoTDB subscription client to
> classify data, serving as a channel for data publication. Producers
> publish data to specific topics, while consumers subscribe to these
> topics to receive related data. In the IoTDB subscription client,
> topics describe the sequence characteristics, time characteristics,
> presentation forms, and optional custom processing logic of subscribed
> data.
> - Consumer is an application or service in the IoTDB subscription
> client responsible for receiving and processing data published to
> specific topics. Consumers retrieve data from the queue and perform
> corresponding processing. The IoTDB subscription client provides two
> types of consumers: pull consumer and push consumer.
> - Consumer Group is a collection of consumers. When different
> consumers in the same consumer group subscribe to the same topic,
> these consumers share the processing progress of data under this
> topic. Each data under this topic can only be processed by one
> consumer within the group, ensuring that data is not processed
> repeatedly.
>
> Based on these concepts, the IoTDB subscription client provides a
> series of SDKs for creating topics, creating consumers, subscribing to
> topics, consuming data, committing consumption progress, and obtaining
> subscription relationships. Here's a comprehensive example[1]
> demonstrating how to use the subscription client JAVA SDK to consume
> data from IoTDB.
>
> Technically, the data subscription client will rely on IoTDB's
> existing streaming processing framework (Pipe). Each subscription
> corresponds to a user-invisible pipe task. Subscription relationships
> and other metadata information are persistently maintained through the
> config node. Basic functionality has been developed on the master
> branch[2], and further iterations will continuously improve it.
>
> I hope you are interested in this feature and would like to
> participate in the development and testing. You can also leave your
> comments and suggestions in this thread. Appreciate any
> suggestion/feedback & contribution.
>
> Thank you for your attention and support.
>
> Best regards,
>
> VGalaxies
>
> Reference:
> 1. 
> https://github.com/apache/iotdb/blob/master/example/session/src/main/java/org/apache/iotdb/SubscriptionSessionExample.java
> 2. https://github.com/apache/iotdb/tree/master
>

Reply via email to