[jira] [Updated] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

Chris Sampson (Jira) Thu, 07 Sep 2023 23:47:05 -0700


     [ 
https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Sampson updated NIFI-11985:
---------------------------------
    Fix Version/s: 1.latest
                   2.latest

> Implement a processor to consume documents from Elasticsearch indices
> ---------------------------------------------------------------------
>
>                 Key: NIFI-11985
>                 URL: https://issues.apache.org/jira/browse/NIFI-11985
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Chris Sampson
>            Assignee: Chris Sampson
>            Priority: Minor
>             Fix For: 1.latest, 2.latest
>
>
> It is possible to use Elasticsearch to store series data, i.e. data is 
> continually added to an Elasticsearch index over time, with a {{date}} or a 
> 1-up numeric {{long}} field.
> This is more likely with the advent of [Data 
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html)
>  or the recent [Time Series Data 
> Streams](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html),
>  both of which use a {{@timestamp}} field to indicate when a document was 
> added to the stream.
> There are use cases where NiFi users may want to consume new data from the 
> Elasticsearch index/data stream after it's arrived, then pass it to another 
> service.
> NiFi would need to:
> * know which field to use as the "series field" (e.g. {{@timestamp}})
> * track the last read "series field" value via State so that the same 
> documents are not retrieved from Elasticsearch multiple times
> * allow for the optional specification of the "last read" field value, e.g. 
> if a user wants to offset the start of the documents to be read (this value 
> should only be used if a value doesn't also exist within the processor's 
> State)
> * allow for the fact that the "last read" vlaue will be blank when the 
> processor is first run (and the value is not otherwise specified), meaning we 
> want to retrieve all existing data
> * allow for users to specify an optional Query Filter to apply to the search 
> within Elasticsearch when finding documents to retrieve
> Possible implementations should consider using the {{SearchElasticsearch}} 
> processor as a basis, which already uses State tracking between processor 
> executions and allows for the retrieval of Elasticsearch documents in a 
> paginated manner (thus avoiding pulling too much data in a single request).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

Reply via email to