[jira] [Commented] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

ASF subversion and git services (Jira) Tue, 10 Oct 2023 03:02:35 -0700


    [ 
https://issues.apache.org/jira/browse/NIFI-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773629#comment-17773629
 ]


ASF subversion and git services commented on NIFI-11985:
--------------------------------------------------------

Commit c09134779542612a4cbc5d02b7c4f1c612db9425 in nifi's branch 
refs/heads/main from Chris Sampson
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=c091347795 ]

NIFI-11985: Add ConsumeElasticsearch processor

Signed-off-by: Joe Gresock <jgres...@gmail.com>
This closes #7671.


> Implement a processor to consume documents from Elasticsearch indices
> ---------------------------------------------------------------------
>
>                 Key: NIFI-11985
>                 URL: https://issues.apache.org/jira/browse/NIFI-11985
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Chris Sampson
>            Assignee: Chris Sampson
>            Priority: Minor
>             Fix For: 1.latest, 2.latest
>
>         Attachments: NIFI-11985_Flow.json
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It is possible to use Elasticsearch to store series data, i.e. data is 
> continually added to an Elasticsearch index over time, with a {{date}} or a 
> 1-up numeric {{long}} field.
> This is more likely with the advent of [Data 
> Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html]
>  or the recent [Time Series Data 
> Streams|https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html],
>  both of which use a {{@timestamp}} field to indicate when a document was 
> added to the stream.
> There are use cases where NiFi users may want to consume new data from the 
> Elasticsearch index/data stream after it's arrived, then pass it to another 
> service.
> NiFi would need to:
> * know which field to use as the "series field" (e.g. {{@timestamp}})
> * track the last read "series field" value via State so that the same 
> documents are not retrieved from Elasticsearch multiple times
> * allow for the optional specification of the "last read" field value, e.g. 
> if a user wants to offset the start of the documents to be read (this value 
> should only be used if a value doesn't also exist within the processor's 
> State)
> * allow for the fact that the "last read" vlaue will be blank when the 
> processor is first run (and the value is not otherwise specified), meaning we 
> want to retrieve all existing data
> * allow for users to specify an optional Query Filter to apply to the search 
> within Elasticsearch when finding documents to retrieve
> Possible implementations should consider using the {{SearchElasticsearch}} 
> processor as a basis, which already uses State tracking between processor 
> executions and allows for the retrieval of Elasticsearch documents in a 
> paginated manner (thus avoiding pulling too much data in a single request).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (NIFI-11985) Implement a processor to consume documents from Elasticsearch indices

Reply via email to