Hi Yuchen, do I understand it correctly, that with this, a client could subscribe to data and receive this in TsFile format? This plus the ability of an IoTDB instance subscribing to such a TsFile subscription would perfectly align with my plans for 2024 allowing TsFile libraries on PLCs sending data to IoTDB as well as simplify data-collection gateways.
Chris Von: Yuchen Ding <vgalax...@apache.org> Datum: Freitag, 31. Mai 2024 um 10:14 An: dev@iotdb.apache.org <dev@iotdb.apache.org> Betreff: Re: [Proposal] Data Subscription Client on IoTDB Hello everyone, I am VGalaxies. Recently, I have been working on providing TsFile subscription support for IoTDB data subscription client. The background of this feature is to achieve data file export and backup for multi-replica clusters using data subscription. Using the existing data subscription client, the subscribed data is in the form of SessionDataSet. The server needs to parse TsFile, and the client needs to rewrite TsFile. Raw data is transmitted over the network, without leveraging the high compression features of TsFile. Therefore, we hope to support exporting TsFile using the TsFile client SDK in data subscriptions. In terms of functionality, TsFile subscription support for the data subscription client includes three steps. First, create a topic with a data presentation format of TsFile, by specifying the topic format as TsFileHandler. Second, when creating a consumer, specify the directory where the subscribed TsFile will be saved using the fileSaveDir parameter. Third, obtain the corresponding handler based on the type of SubscriptionMessage, which is SubscriptionTsFileHandler. SubscriptionTsFileHandler encapsulates operations like cp, mv, and rm from the Java standard library, and can also iterate data through the TsFile SDK, i.e., TsFileReader. Users can achieve TsFile subscription with minimal configuration. Technically, we have restructured the Message Payload of the pipeSubscribe RPC poll type to support the transmission of data in both SessionDataSet and TsFile formats. The transmission of a TsFile file is divided into multiple events, including tsfile init event, tsfile piece event, and tsfile seal event. The reliable transmission of TsFile files is achieved through the interaction of these events between the DN side and the client side. Currently, TsFile subscription still has some limitations. For example, when the format of the topic is TsFileHandler, there are certain constraints on its path and time configuration. We will continue to optimize these issues in future iterations. I have initially implemented support for TsFile subscription in this PR[1]. I hope you are interested in this feature and would like to participate in the development and testing. You can also leave your comments and suggestions in this thread. Appreciate any suggestion/feedback & contribution. Thank you for your attention and support. Best regards, VGalaxies Reference: 1. https://github.com/apache/iotdb/pull/12326 On 2024/04/08 03:06:59 VGalaxies wrote: > Hello everyone, > > I am VGalaxies, a new contributor to Apache IoTDB. I am excited to > share with you a new feature that I have been working on for the past > few months. > > The data subscription client is a new way to access data within IoTDB, > distinct from the traditional method of querying data using SQL-like > syntax. In scenarios where real-time data, quick response to data > changes, and building highly event-driven systems are required, data > subscription has greater advantages over data querying. For example, > in the following two scenarios: > > 1. Replace extensive polling queries for large amounts of data: Avoid > significant impacts on the performance of existing systems when > querying frequently or when there are many data points. Also, avoid > problems with determining the query scope and ensure downstream > receives accurate full data. > 2. Facilitate downstream system integration: It's easier to integrate > with components such as Flink, Spark, Kafka/DataX, Camel/MySQL, etc. > There's no need to customize the logic of IoTDB's data change capture > for each big data component separately, simplifying integration > component design and making it easier for users. > > The IoTDB subscription client references some features defined by some > message queue products like Kafka. It consists of 3 core concepts: > Topic, Consumer, and Consumer Group. > > - Topic is a logical concept used by the IoTDB subscription client to > classify data, serving as a channel for data publication. Producers > publish data to specific topics, while consumers subscribe to these > topics to receive related data. In the IoTDB subscription client, > topics describe the sequence characteristics, time characteristics, > presentation forms, and optional custom processing logic of subscribed > data. > - Consumer is an application or service in the IoTDB subscription > client responsible for receiving and processing data published to > specific topics. Consumers retrieve data from the queue and perform > corresponding processing. The IoTDB subscription client provides two > types of consumers: pull consumer and push consumer. > - Consumer Group is a collection of consumers. When different > consumers in the same consumer group subscribe to the same topic, > these consumers share the processing progress of data under this > topic. Each data under this topic can only be processed by one > consumer within the group, ensuring that data is not processed > repeatedly. > > Based on these concepts, the IoTDB subscription client provides a > series of SDKs for creating topics, creating consumers, subscribing to > topics, consuming data, committing consumption progress, and obtaining > subscription relationships. Here's a comprehensive example[1] > demonstrating how to use the subscription client JAVA SDK to consume > data from IoTDB. > > Technically, the data subscription client will rely on IoTDB's > existing streaming processing framework (Pipe). Each subscription > corresponds to a user-invisible pipe task. Subscription relationships > and other metadata information are persistently maintained through the > config node. Basic functionality has been developed on the master > branch[2], and further iterations will continuously improve it. > > I hope you are interested in this feature and would like to > participate in the development and testing. You can also leave your > comments and suggestions in this thread. Appreciate any > suggestion/feedback & contribution. > > Thank you for your attention and support. > > Best regards, > > VGalaxies > > Reference: > 1. > https://github.com/apache/iotdb/blob/master/example/session/src/main/java/org/apache/iotdb/SubscriptionSessionExample.java > 2. https://github.com/apache/iotdb/tree/master >