Re: Getting Duplicate Flowfiles from InvokeHttp and QueryElasticsearchHttp

Bryan Bende Tue, 19 Mar 2019 08:06:26 -0700

Hi Martin,

Since you have a 2 node cluster, when you start the processors they
are likely running on both nodes doing the same thing twice and what
you see in the stats and queues is the combined values across the
cluster, so that is why you see either 2 or 4, instead of 1 or 2.


Each processor has an option on the scheduling tab of the
configuration to determine where it runs, either all nodes or primary
node only. You most likely want primary node only so that it only runs
on one of the nodes in the cluster, whichever is the primary node at
the time it is scheduled to run.

Hope that helps.

-Bryan

On Tue, Mar 19, 2019 at 10:58 AM martin.cooley <martin.coo...@gmail.com> wrote:
>
> Hey Bryan,
>
> Indeed it is a 2 node cluster.  I would like to say I see where this is
> going, but I don't.
>
> Thanks,
>
> Martin
>
>
>
> Bryan Bende wrote
> > Hello,
> >
> > Are you running a NiFi cluster of 2 nodes, or a standalone instance of
> > NiFi?
> >
> > -Bryan
> >
> > On Mon, Mar 18, 2019 at 12:21 PM Martin Cooley &lt;
>
> > martin.cooley@
>
> > &gt; wrote:
> >>
> >> If I configure an InvokeHttp processor to query against an elasticsearch
> >> node, I should get one json object written to a flowfile.  If I use the
> >> QueryElasticsearchHttp processor, if the query returns two documents from
> >> the index, I should get two json objects, each written to their own
> >> flowfile.
> >>
> >> However, the InvokeHttp processor is writing two flowfiles.  They have
> >> separate UUIDs, but the contents are the same.  Yes, the processor is
> >> scheduled to run every 900 seconds.
> >>
> >> The QueryElasticsearchHttp processor is writing 4 flowfiles.  It, too, is
> >> scheduled to run every 900 seconds.
> >>
> >> Elasticsearch is returning:
> >>
> >> {
> >>   "took": 1,
> >>   "timed_out": false,
> >>   "_shards": {
> >>     "total": 5,
> >>     "successful": 5,
> >>     "skipped": 0,
> >>     "failed": 0
> >>   },
> >>   "hits": {
> >>     "total": 2,
> >>     "max_score": 0.2876821,
> >>     "hits": [
> >>       {
> >>         "_index": "etltodoc",
> >>         "_type": "document_record",
> >>         "_id": "2045680246129",
> >>         "_score": 0.2876821,
> >>         "_source": {
> >>           "myguid": "2045680246129",
> >>           "filename": "sample1.pdf",
> >>           "exception": "",
> >>           "original_filename": "\\\\f1\\DocsRepo\\CF\\sample1.pdf",
> >>           "conceptCode": "C2159782",
> >>           "timestamp": "2019-03-12T12:43:21.166531",
> >>           "status": "delivered"
> >>         }
> >>       },
> >>       {
> >>         "_index": "etltodock",
> >>         "_type": "document_record",
> >>         "_id": "2045680246128",
> >>         "_score": 0.2876821,
> >>         "_source": {
> >>           "myguid": "2045680246128",
> >>           "filename": "sample2.pdf",
> >>           "exception": "",
> >>           "original_filename": "\\\\f1\\DocsRepo\\CF\\sample2.pdf",
> >>           "conceptCode": "C2159782",
> >>           "timestamp": "2019-03-12T12:43:21.165467",
> >>           "status": "delivered"
> >>         }
> >>       }
> >>     ]
> >>   }
> >> }
> >>
> >> I'm hoping I just have something misconfigured, but I have tried playing
> >> with just about every setting.  On the QueryElasticsearchHttp processor,
> >> if I set limit to one, I still get two flowfiles instead of four.
> >>
> >> Any help will be much appreciated.
> >>
> >> Martin
>
>
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/

Re: Getting Duplicate Flowfiles from InvokeHttp and QueryElasticsearchHttp

Reply via email to