[ https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773629#comment-17773629 ]
ASF subversion and git services commented on NIFI-11985: -------------------------------------------------------- Commit c09134779542612a4cbc5d02b7c4f1c612db9425 in nifi's branch refs/heads/main from Chris Sampson [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=c091347795 ] NIFI-11985: Add ConsumeElasticsearch processor Signed-off-by: Joe Gresock <jgres...@gmail.com> This closes #7671. > Implement a processor to consume documents from Elasticsearch indices > --------------------------------------------------------------------- > > Key: NIFI-11985 > URL: https://issues.apache.org/jira/browse/NIFI-11985 > Project: Apache NiFi > Issue Type: New Feature > Reporter: Chris Sampson > Assignee: Chris Sampson > Priority: Minor > Fix For: 1.latest, 2.latest > > Attachments: NIFI-11985_Flow.json > > Time Spent: 1.5h > Remaining Estimate: 0h > > It is possible to use Elasticsearch to store series data, i.e. data is > continually added to an Elasticsearch index over time, with a {{date}} or a > 1-up numeric {{long}} field. > This is more likely with the advent of [Data > Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html] > or the recent [Time Series Data > Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html], > both of which use a {{@timestamp}} field to indicate when a document was > added to the stream. > There are use cases where NiFi users may want to consume new data from the > Elasticsearch index/data stream after it's arrived, then pass it to another > service. > NiFi would need to: > * know which field to use as the "series field" (e.g. {{@timestamp}}) > * track the last read "series field" value via State so that the same > documents are not retrieved from Elasticsearch multiple times > * allow for the optional specification of the "last read" field value, e.g. > if a user wants to offset the start of the documents to be read (this value > should only be used if a value doesn't also exist within the processor's > State) > * allow for the fact that the "last read" vlaue will be blank when the > processor is first run (and the value is not otherwise specified), meaning we > want to retrieve all existing data > * allow for users to specify an optional Query Filter to apply to the search > within Elasticsearch when finding documents to retrieve > Possible implementations should consider using the {{SearchElasticsearch}} > processor as a basis, which already uses State tracking between processor > executions and allows for the retrieval of Elasticsearch documents in a > paginated manner (thus avoiding pulling too much data in a single request). -- This message was sent by Atlassian Jira (v8.20.10#820010)