fgerlits commented on code in PR #1483: URL: https://github.com/apache/nifi-minifi-cpp/pull/1483#discussion_r1135682353
########## PROCESSORS.md: ########## @@ -249,165 +305,176 @@ In the list below, the names of required properties appear in bold. Any other pr ### Description Compresses or decompresses the contents of FlowFiles using a user-specified compression algorithm and updates the mime.type attribute as appropriate + ### Properties In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language. -| Name | Default Value | Allowable Values | Description | -|--------------------|-------------------------|------------------|--------------------------------------------------------------------------------| -| Compression Format | use mime.type attribute | | The compression format to use. | -| Compression Level | 1 | | The compression level to use; this is valid only when using GZIP compression. | -| Mode | compress | | Indicates whether the processor should compress content or decompress content. | -| Update Filename | false | | Determines if filename extension need to be updated | +| Name | Default Value | Allowable Values | Description | +|--------------------|-------------------------|------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Mode | compress | compress<br/>decompress | Indicates whether the processor should compress content or decompress content. | +| Compression Level | 1 | | The compression level to use; this is valid only when using GZIP compression. | +| Compression Format | use mime.type attribute | bzip2<br/>gzip<br/>lzma<br/>use mime.type attribute<br/>xz-lzma2 | The compression format to use. | +| Update Filename | false | | Determines if filename extension need to be updated | +| Encapsulate in TAR | true | | If true, on compression the FlowFile is added to a TAR archive and then compressed, and on decompression a compressed, TAR-encapsulated FlowFile is expected.<br/>If false, on compression the content of the FlowFile simply gets compressed, and on decompression a simple compressed content is expected.<br/>true is the behaviour compatible with older MiNiFi C++ versions, false is the behaviour compatible with NiFi. | +| Batch Size | 1 | | Maximum number of FlowFiles processed in a single session | + ### Relationships | Name | Description | |---------|---------------------------------------------------------------------------------------------------------------| -| failure | FlowFiles will be transferred to the failure relationship if they fail to compress/decompress | | success | FlowFiles will be transferred to the success relationship after successfully being compressed or decompressed | +| failure | FlowFiles will be transferred to the failure relationship if they fail to compress/decompress | ## ConsumeJournald + ### Description -Consume systemd-journald journal messages. Available on Linux only. + +Consume systemd-journald journal messages. Creates one flow file per message. Fields are mapped to attributes. Realtime timestamp is mapped to the 'timestamp' attribute. Available on Linux only. ### Properties -All properties are required with a default value, making them effectively optional. None of the properties support the NiFi Expression Language. -| Name | Default Value | Allowable Values | Description | -|----------------------|---------------|------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------| -| Batch Size | 1000 | Positive numbers | The maximum number of entries processed in a single execution. | -| Payload Format | Syslog | Raw<br>Syslog | Configures flow file content formatting.<br>Raw: only the message.<br>Syslog: similar to syslog or journalctl output. | -| Include Timestamp | true | true<br>false | Include message timestamp in the 'timestamp' attribute. | -| Journal Type | System | User<br>System<br>Both | Type of journal to consume. | -| Process Old Messages | false | true<br>false | Process events created before the first usage (schedule) of the processor instance. | -| Timestamp Format | %x %X %Z | [date format](https://howardhinnant.github.io/date/date.html#to_stream_formatting) | Format string to use when creating the timestamp attribute or writing messages in the syslog format. ISO/ISO 8601/ISO8601 are equivalent to "%FT%T%Ez". | +In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language. + +| Name | Default Value | Allowable Values | Description | +|--------------------------|---------------|--------------------------|-----------------------------------------------------------------------------------------------------------------| +| **Batch Size** | 1000 | | The maximum number of entries processed in a single execution. | +| **Payload Format** | Syslog | Raw<br/>Syslog | Configures flow file content formatting. Raw: only the message. Syslog: similar to syslog or journalctl output. | +| **Include Timestamp** | true | | Include message timestamp in the 'timestamp' attribute. | +| **Journal Type** | System | Both<br/>System<br/>User | Type of journal to consume. | +| **Process Old Messages** | false | | Process events created before the first usage (schedule) of the processor instance. | +| **Timestamp Format** | %x %X %Z | | Format string to use when creating the timestamp attribute or writing messages in the syslog format. | ### Relationships -| Name | Description | -|---------|--------------------------------| -| success | Journal messages as flow files | +| Name | Description | +|---------|-----------------------------------------| +| success | Successfully consumed journal messages. | ## ConsumeKafka ### Description Consumes messages from Apache Kafka and transform them into MiNiFi FlowFiles. The application should make sure that the processor is triggered at regular intervals, even if no messages are expected, to serve any queued callbacks waiting to be called. Rebalancing can also only happen on trigger. + ### Properties In the list below, the names of required properties appear in bold. Any other properties (not in bold) are considered optional. The table also indicates any default values, and whether a property supports the NiFi Expression Language. -| Name | Default Value | Allowable Values | Description | -|------------------------------|----------------|--------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Duplicate Header Handling | Keep Latest | Comma-separated Merge<br>Keep First<br>Keep Latest<br> | For headers to be added as attributes, this option specifies how to handle cases where multiple headers are present with the same key. For example in case of receiving these two headers: "Accept: text/html" and "Accept: application/xml" and we want to attach the value of "Accept" as a FlowFile attribute:<br/> - "Keep First" attaches: "Accept -> text/html"<br/> - "Keep Latest" attaches: "Accept -> application/xml"<br/> - "Comma-separated Merge" attaches: "Accept -> text/html, application/xml" | -| **Group ID** | | | A Group ID is used to identify consumers that are within the same consumer group. Corresponds to Kafka's 'group.id' property.<br/>**Supports Expression Language: true** | -| Headers To Add As Attributes | | | A comma separated list to match against all message headers. Any message header whose name matches an item from the list will be added to the FlowFile as an Attribute. If not specified, no Header values will be added as FlowFile attributes. The behaviour on when multiple headers of the same name are present is set using the DuplicateHeaderHandling attribute. | -| **Honor Transactions** | true | | Specifies whether or not MiNiFi should honor transactional guarantees when communicating with Kafka. If false, the Processor will use an "isolation level" of read_uncomitted. This means that messages will be received as soon as they are written to Kafka but will be pulled, even if the producer cancels the transactions. If this value is true, MiNiFi will not receive any messages for which the producer's transaction was canceled, but this can result in some latency since the consumer must wait for the producer to finish its entire transaction instead of pulling as the messages become available. | -| **Kafka Brokers** | localhost:9092 | | A comma-separated list of known Kafka Brokers in the format <host>:<port>.<br/>**Supports Expression Language: true** | -| Kerberos Keytab Path | | | The path to the location on the local filesystem where the kerberos keytab is located. Read permission on the file is required. | -| Kerberos Principal | | | Keberos Principal | -| Kerberos Service Name | | | Kerberos Service Name | -| **Key Attribute Encoding** | UTF-8 | Hex<br>UTF-8<br> | FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded. | -| Max Poll Records | 10000 | | Specifies the maximum number of records Kafka should return when polling each time the processor is triggered. | -| **Max Poll Time** | 4 seconds | | Specifies the maximum amount of time the consumer can use for polling data from the brokers. Polling is a blocking operation, so the upper limit of this value is specified in 4 seconds. | -| Message Demarcator | | | Since KafkaConsumer receives messages in batches, you have an option to output FlowFiles which contains all Kafka messages in a single batch for a given topic and partition and this property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple Kafka messages. This is an optional property and if not provided each Kafka message received will result in a single FlowFile which time it is triggered. <br/>**Supports Expression Language: true** | -| Message Header Encoding | UTF-8 | Hex<br>UTF-8<br> | Any message header that is found on a Kafka message will be added to the outbound FlowFile as an attribute. This property indicates the Character Encoding to use for deserializing the headers. | -| **Offset Reset** | latest | earliest<br>latest<br>none<br> | Allows you to manage the condition when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted). Corresponds to Kafka's 'auto.offset.reset' property. | -| Password | | | The password for the given username when the SASL Mechanism is sasl_plaintext | -| SASL Mechanism | GSSAPI | GSSAPI<br/>PLAIN | The SASL mechanism to use for authentication. Corresponds to Kafka's 'sasl.mechanism' property. | -| **Security Protocol** | plaintext | plaintext<br/>ssl<br/>sasl_plaintext<br/>sasl_ssl | Protocol used to communicate with brokers. Corresponds to Kafka's 'security.protocol' property. | -| Session Timeout | 60 seconds | | Client group session and failure detection timeout. The consumer sends periodic heartbeats to indicate its liveness to the broker. If no hearts are received by the broker for a group member within the session timeout, the broker will remove the consumer from the group and trigger a rebalance. The allowed range is configured with the broker configuration properties group.min.session.timeout.ms and group.max.session.timeout.ms. | -| SSL Context Service | | | SSL Context Service Name | -| **Topic Name Format** | Names | Names<br>Patterns<br> | Specifies whether the Topic(s) provided are a comma separated list of names or a single regular expression. Using regular expressions does not automatically discover Kafka topics created after the processor started. | -| **Topic Names** | | | The name of the Kafka Topic(s) to pull from. Multiple topic names are supported as a comma separated list.<br/>**Supports Expression Language: true** | -| Username | | | The username when the SASL Mechanism is sasl_plaintext | -### Properties +| Name | Default Value | Allowable Values | Description | +|------------------------------|----------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| SSL Context Service | | | SSL Context Service Name | +| **Security Protocol** | plaintext | plaintext<br/>sasl_plaintext<br/>sasl_ssl<br/>ssl | Protocol used to communicate with brokers. Corresponds to Kafka's 'security.protocol' property. | +| Kerberos Service Name | | | Kerberos Service Name | +| Kerberos Principal | | | Keberos Principal | +| Kerberos Keytab Path | | | The path to the location on the local filesystem where the kerberos keytab is located. Read permission on the file is required. | +| **SASL Mechanism** | GSSAPI | GSSAPI<br/>PLAIN | The SASL mechanism to use for authentication. Corresponds to Kafka's 'sasl.mechanism' property. | +| Username | | | The username when the SASL Mechanism is sasl_plaintext | +| Password | | | The password for the given username when the SASL Mechanism is sasl_plaintext | +| **Kafka Brokers** | localhost:9092 | | A comma-separated list of known Kafka Brokers in the format <host>:<port>.<br/>**Supports Expression Language: true** | +| **Topic Names** | | | The name of the Kafka Topic(s) to pull from. Multiple topic names are supported as a comma separated list.<br/>**Supports Expression Language: true** | +| **Topic Name Format** | Names | Names<br/>Patterns | Specifies whether the Topic(s) provided are a comma separated list of names or a single regular expression. Using regular expressions does not automatically discover Kafka topics created after the processor started. | +| **Honor Transactions** | true | | Specifies whether or not MiNiFi should honor transactional guarantees when communicating with Kafka. If false, the Processor will use an "isolation level" of read_uncomitted. This means that messages will be received as soon as they are written to Kafka but will be pulled, even if the producer cancels the transactions. If this value is true, MiNiFi will not receive any messages for which the producer's transaction was canceled, but this can result in some latency since the consumer must wait for the producer to finish its entire transaction instead of pulling as the messages become available. | +| **Group ID** | | | A Group ID is used to identify consumers that are within the same consumer group. Corresponds to Kafka's 'group.id' property.<br/>**Supports Expression Language: true** | +| **Offset Reset** | latest | earliest<br/>latest<br/>none | Allows you to manage the condition when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted). Corresponds to Kafka's 'auto.offset.reset' property. | +| **Key Attribute Encoding** | UTF-8 | Hex<br/>UTF-8 | FlowFiles that are emitted have an attribute named 'kafka.key'. This property dictates how the value of the attribute should be encoded. | +| Message Demarcator | | | Since KafkaConsumer receives messages in batches, you have an option to output FlowFiles which contains all Kafka messages in a single batch for a given topic and partition and this property allows you to provide a string (interpreted as UTF-8) to use for demarcating apart multiple Kafka messages. This is an optional property and if not provided each Kafka message received will result in a single FlowFile which time it is triggered. <br/>**Supports Expression Language: true** | +| Message Header Encoding | UTF-8 | Hex<br/>UTF-8 | Any message header that is found on a Kafka message will be added to the outbound FlowFile as an attribute. This property indicates the Character Encoding to use for deserializing the headers. | +| Headers To Add As Attributes | | | A comma separated list to match against all message headers. Any message header whose name matches an item from the list will be added to the FlowFile as an Attribute. If not specified, no Header values will be added as FlowFile attributes. The behaviour on when multiple headers of the same name are present is set using the DuplicateHeaderHandling attribute. | Review Comment: fixed in f48606ccce5a4ce7bcd3ddb73ed5eb013d03b5c8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org