Re: NiFi 2.0 and possibility to ignore X.509 certificate or force authorization with OpenID/SSO.

2024-10-02 Thread Jens M. Kofoed
I'm using Google Chrome version 129.0.6668.90 and have no problems. Try
start an incognito tab and go to your NiFi web page. When I open the
browser the first time, I will be prompt to select a certificate, what ever
I choose chrome will remember until I have all browser closed. I can not
remember if I have setup something about deleting cookies eg. when I close
the browser. But it could be something like that which gives you an issue.
kind regards
Jens M. Kofoed

Den tirs. 1. okt. 2024 kl. 12.41 skrev Hans Deragon :

> What browser and version are you using?  Where I work, we use Chrome
> 1.129 and Edge 129 but neither offer the chance to select a certificate.
>
> For Chrome, it used to behave like you mention but a few months ago, the
> behavior changed and there is no selection of certificate anymore (this
> behavior was useful to login into a NiFi instance without a certificate
> and to the NiFi Registry with a certificate.
>
> On 27/09/2024 02:15, Jens M. Kofoed wrote:
> > I’m using ldap instead of OpenID, but it is the same things going on.
> When I go to the NiFi website my browser prompts me with the option to
> select my installed X.509 certificate, but I can just cancel using the
> certificate and I gets to the login page.
> >
> > For me it’s not a big problem and I use it as a backup option. All users
> are handled by AD, via groups but I have two admins which also have a local
> user so they can login with certificates. This is a backup if connections
> to AD/ldap is down for some reason.
> >
> > Kind regards
> > Jens
> >
> >> Den 26. sep. 2024 kl. 21.53 skrev Hans Deragon :
> >>
> >> Greetings,
> >>
> >> We discovered with NiFi 2.0.0-M4 that if a personal X.509 certificate
> is set in user accounts under Windows, that certificate is getting used by
> NiFi for authorization instead of the normal OpenID/SSO headers.  The user
> id in the X.509 certificate is not the same as the one in OpenID/SSO (Okta)
> and thus, the person is denied access to NiFi.
> >>
> >> This particular certificate is not meant to be used by NiFi to
> authenticate and authorize users in NiFi even though it is recognized by
> our Identity Provider.  We desire that NiFi only authenticate and authorize
> users with OpenID/SSO (which works when I remove the personal certificate
> from user's Windows workstations).
> >>
> >> Seams that there is no option available in nifi.properties to prevent
> this behaviour.  Thus, my following questions/remarks:
> >>
> >> - Is there a way to disable this behaviour?
> >>
> >> - If not, would it be acceptable to add a parameter in nifi.properties
> to disable the X.509 certificate extraction?  What name this parameter
> should have and how should it be implemented?  I could submit a pull
> request, but would be nice to have some guidance from a NiFi developer.
> >>
> >> - Or... is there a way to change the program so that authorization does
> not fail as soon as one method tested fails, but succeeds if any other
> method succeed?
> >>
> >> Technicalities:
> >>
> >> Changing the code in X509AuthenticationFilter.attemptAuthentication()
> to return always 'null' fixes our problem by making NiFi believe that no
> X.509 certificate is available and leaves the others filters to be tested,
> including the one handling OpenID/SSO.
> >>
> >> For my tests, I recompiled NiFi's code at Git tag 'rel/nifi-2.0.0-M4'.
> >>
> >> Best regards,
> >> Hans Deragon
> >>
> >> 
>
>


Re: NiFi 2.0 and possibility to ignore X.509 certificate or force authorization with OpenID/SSO.

2024-09-26 Thread Jens M. Kofoed
I’m using ldap instead of OpenID, but it is the same things going on. When I go 
to the NiFi website my browser prompts me with the option to select my 
installed X.509 certificate, but I can just cancel using the certificate and I 
gets to the login page. 

For me it’s not a big problem and I use it as a backup option. All users are 
handled by AD, via groups but I have two admins which also have a local user so 
they can login with certificates. This is a backup if connections to AD/ldap is 
down for some reason. 

Kind regards 
Jens 

> Den 26. sep. 2024 kl. 21.53 skrev Hans Deragon :
> 
> Greetings,
> 
> We discovered with NiFi 2.0.0-M4 that if a personal X.509 certificate is set 
> in user accounts under Windows, that certificate is getting used by NiFi for 
> authorization instead of the normal OpenID/SSO headers.  The user id in the 
> X.509 certificate is not the same as the one in OpenID/SSO (Okta) and thus, 
> the person is denied access to NiFi.
> 
> This particular certificate is not meant to be used by NiFi to authenticate 
> and authorize users in NiFi even though it is recognized by our Identity 
> Provider.  We desire that NiFi only authenticate and authorize users with 
> OpenID/SSO (which works when I remove the personal certificate from user's 
> Windows workstations).
> 
> Seams that there is no option available in nifi.properties to prevent this 
> behaviour.  Thus, my following questions/remarks:
> 
> - Is there a way to disable this behaviour?
> 
> - If not, would it be acceptable to add a parameter in nifi.properties to 
> disable the X.509 certificate extraction?  What name this parameter should 
> have and how should it be implemented?  I could submit a pull request, but 
> would be nice to have some guidance from a NiFi developer.
> 
> - Or... is there a way to change the program so that authorization does not 
> fail as soon as one method tested fails, but succeeds if any other method 
> succeed?
> 
> Technicalities:
> 
> Changing the code in X509AuthenticationFilter.attemptAuthentication() to 
> return always 'null' fixes our problem by making NiFi believe that no X.509 
> certificate is available and leaves the others filters to be tested, 
> including the one handling OpenID/SSO.
> 
> For my tests, I recompiled NiFi's code at Git tag 'rel/nifi-2.0.0-M4'.
> 
> Best regards,
> Hans Deragon
> 
> 


Re: Smb processor landscape

2024-02-27 Thread Jens M. Kofoed
Hi Anders, you are not alone :-)

I've the same issue and have tried to create some other jira's for a long
time ago.
FetchSmb: Addind completing strategy  -
https://issues.apache.org/jira/browse/NIFI-12231?filter=-2
PutSmb: Add a rename strategy
https://issues.apache.org/jira/browse/NIFI-10150?filter=-2

I hope some lovely programmer have the possibility to help us with these
SMB processors.

kind regards
Jens M. Kofoed


Den tirs. 27. feb. 2024 kl. 11.36 skrev Anders Synstad :

> Hi,
>
> I've been trying to understand the smb-processor landscape lately, and
> have run
> into some problems.
>
> 1) Timeout
>
> It seems like timeout was introduced with the SmbjClientProviderService,
> and
> SmbUtils.java defines the withTimeout setting from the underlying smbj
> library.
>
> In SmbProperties.java, the default timeout value is set to 5 sec.
>
> The problem is that this code is used by the older GetSmbFile and
> PutSmbFile
> processors as well, but the timeout configuration item is not exposed in
> their
> config ui.
>
> 2) Minimum File Age
>
> The ListSmb processor supports "Minimum File Age", which is great, but the
> older GetSmbFile does not.
>
> GetSmbFile does not support incoming connections either, so you cannot use
> ListSmb before GetSmbFile to select files.
>
> 3) Keep Source File
>
> The older GetSmbFile processor has the "Keep Source File" setting for
> removing
> files after you get them, which is nice. However, the newer FetchSmb does
> not
> appear to support this.
>
> As a result, you cannot get and delete files with a minimum age greater
> than X.
>
> 4) DFS referrals
>
> The underlying hierynomus/smbj library supports DFS referrals with the
> withDfsEnabled settings. The default value for this in the library is
> False.
>
> It would be very useful if this setting was exposed in all smb related
> processors and services that use hierynomus/smbj.
>
> I created https://issues.apache.org/jira/browse/NIFI-12837 for that.
>
>
> I might misunderstand something here as I'm not a java coder. Tried to
> patch
> the DFS feature, and it started working against the cifs cluster in my
> environment at least.
>
> I also attempted to patch the timeout ui setting in PutSmbFile, and that
> seemed
> to resolve some of the timeout issues I was seeing with PutSmbFile as well.
>
> But I don't know if there is some overarching plan or goal when it comes
> to the
> smb processors, or should I just create jira tickets for these things?
>
>
> --
> Anders Synstad
>


Re: Help : LoadBalancer

2023-09-07 Thread Jens M. Kofoed
Hi Minh

Sorry for the long reply :-)

If you only have one load balancer in front of your NiFi cluster, you
introduce a one-point of failure component. For High Availability (HA), you
can have 2 nodes with load balancing in front of your NiFi cluster.
proxy-01 will have one ip-address and proxy-02 will have another
ip-address. You can then create 2 dns records pointing nifi-proxy to both
proxy-01 and proxy-02. This will give you some kind of HA, but you will
rely on the capability of dns to do dns round robin between the 2 records.
If one node goes down, dns don't know any thing about this and will
continue to reply with dns responces to a dead node. So you can have
situations where half of the request to your nifi-proxy will time outs
becouse of a dead node.

Instead of using dns round robin your can use keepalived on linux. This is
a small program which use a 3th ip-address, a so called virtual ip (vip).
You will have to look at the documentation for keepalived, on how to
configure this: https://www.keepalived.org/
You need to make some small adjustment to linux, to allow it to bind to
none existing ip-addresses.
You configure each node in keepalived with a start weight. In my setup I
have configured node-1 with a weight of 100, and node-2 a weight of 99.
Keepalived will be configured so the to keepalived instances of the nodes
can talk together and sends keepalive signals to each other with there
weight. Based on the weight it receive from other keepalived nodes and its
own, it will decide if it should change state to master or backup. The node
with the highst weight will be master and the master node will add the vip
to the node. Now you create only one dns record for nifi-proxy pointing it
to the vip. and all request will go to only one HAProxy which will load
balance your trafic to the NiFi nodes.
You can also configure keepalived to use a script to tell if the service
you are running at the host is alive. In this case HAProxy. I have created
a small script which curl the stats page of HAProxy and looks if  my
"Backend/nifi-ui-nodes" is up. If it's up the script just exits with exit
code 0 (OK) otherwise it will exit with exits code 1 (error). In the
configuration of keepalived, you configure what should happen with the
weight the script fails. I have configured the check script to adjust the
weight with -10 in case of an error. So if HAProxy at node-01 will die, or
lost network connection to all NiFi nodes the check script will fail, and
the weight will be 90 (100-10). Node-01 will receive a keep-alive signal
from node-02 with a weight of 99 and therefore will change state to backup
and remove the vip from the host. Node-2 will receive a keep-alive signal
from node-01 with a weight of 90, and since its own weight is 99 and is
bigger it will change state into master and add the vip to the host. Now it
will be node-02 which wil receive all request and load balance all trafic
to your NiFi-nodes.

Once again, sorry for the long reply. You don't need 2 HAProxy nodes, one
can do the job. But it will be one point of failure. You can also just use
dns round robin to point to two haproxy nodes, or dive in to the use of
keepalived.

Kind regards
Jens M. Kofoed




Den tors. 7. sep. 2023 kl. 13.32 skrev :

>
> Hello Jens
>
> Thanks a lot for haproxy conf.
>
> Could you give more details about this point :
>
> - I have 2 proxy nodes, which is running in a HA setup with keepalived and
> with a vip.-
> - I have a dns record nifi-cluster01.foo.bar pointing to the vip address
> to keepalived
>
> Thanks
>
> Minh
> *Envoyé:* jeudi 7 septembre 2023 à 11:29
> *De:* "Jens M. Kofoed" 
> *À:* users@nifi.apache.org
> *Objet:* Re: Help : LoadBalancer
> Hi
>
> I have a 3 node cluster running behind a HAProxy setup.
> My haproxy.cfg looks like this:
> global
> log stdout format iso local1 debug # rfc3164, rfc5424, short, raw,
> (iso)
> log stderr format iso local0 err # rfc3164, rfc5424, short, raw, (iso)
> hard-stop-after 30s
>
> defaults
> log global
> mode http
> option httplog
> option dontlognull
> timeout connect 5s
> timeout client 50s
> timeout server 15s
>
> frontend nifi-ui
> bind *:8443
> bind *:443
> mode tcp
> option tcplog
> default_backend nifi-ui-nodes
>
> backend nifi-ui-nodes
> mode tcp
> balance roundrobin
> stick-table type ip size 200k expire 30m
> stick on src
> option httpchk
> http-check send meth GET uri / ver HTTP/1.1 hdr Host
> nifi-cluster01.foo.bar
> server C01N01 nifi-c01n01.foo.bar:8443 check check-ssl verify none
> inter 5s downinter 5s fall 2 rise 3
> server C01N02 nifi-c01n02.foo.bar:8443 check check-ssl verify none
> inter 5s downinter 5s fall 2 rise 3
> server C01N03 nifi-c01n03.foo.b

Re: Help : LoadBalancer

2023-09-07 Thread Jens M. Kofoed
Hi

I have a 3 node cluster running behind a HAProxy setup.
My haproxy.cfg looks like this:
global
log stdout format iso local1 debug # rfc3164, rfc5424, short, raw, (iso)
log stderr format iso local0 err # rfc3164, rfc5424, short, raw, (iso)
hard-stop-after 30s

defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 50s
timeout server 15s

frontend nifi-ui
bind *:8443
bind *:443
mode tcp
option tcplog
default_backend nifi-ui-nodes

backend nifi-ui-nodes
mode tcp
balance roundrobin
stick-table type ip size 200k expire 30m
stick on src
option httpchk
http-check send meth GET uri / ver HTTP/1.1 hdr Host
nifi-cluster01.foo.bar
server C01N01 nifi-c01n01.foo.bar:8443 check check-ssl verify none
inter 5s downinter 5s fall 2 rise 3
server C01N02 nifi-c01n02.foo.bar:8443 check check-ssl verify none
inter 5s downinter 5s fall 2 rise 3
server C01N03 nifi-c01n03.foo.bar:8443 check check-ssl verify none
inter 5s downinter 5s fall 2 rise 3

I have 2 proxy nodes, which is running in a HA setup with keepalived and
with a vip.
I have a dns record nifi-cluster01.foo.bar pointing to the vip address to
keepalived.

In your nifi-properties files you would have so set a proxy host address:
nifi.web.proxy.host: "nifi-cluster01.foo.bar:8443"

This setup is working for me.

Kind regards
Jens M. Kofoed



Den ons. 6. sep. 2023 kl. 16.17 skrev Minh HUYNH :

> Hello Juan
>
> Not sure if you understand my point of view ?
>
> It got a cluster nifi01/nifi02/nifi03
>
> I try to use unique url for instance https://nifi_clu01:9091/nifi, this
> link point to the randomly nifi01/nifi02/nifi03
>
> Regards
>
>
> *Envoyé:* mercredi 6 septembre 2023 à 16:05
> *De:* "Juan Pablo Gardella" 
> *À:* users@nifi.apache.org
> *Objet:* Re: Help : LoadBalancer
> List all servers you need.
>
> server server1 "${NIFI_INTERNAL_HOST1}":8443 ssl
> server server2 "${NIFI_INTERNAL_HOST2}":8443 ssl
>
> On Wed, Sep 6, 2023 at 10:35 AM Minh HUYNH  wrote:
>
>> Thanks a lot for reply.
>>
>> Concerning redirection for one node. It is ok we got it.
>>
>> But how configure nifi and haproxy to point the cluster node, for
>> instance cluster nodes "nifi01, nifi02, nifi03"
>>
>> regards
>>
>> Minh
>>
>>
>>
>> *Envoyé:* mercredi 6 septembre 2023 à 15:29
>> *De:* "Juan Pablo Gardella" 
>> *À:* users@nifi.apache.org
>> *Objet:* Re: Help : LoadBalancer
>> I did that multiple times. Below is how I configured it:
>>
>> frontend http-in
>> # bind ports section
>> acl prefixed-with-nifi path_beg /nifi
>> use_backend nifi if prefixed-with-nifi
>> option forwardfor
>>
>> backend nifi
>> server server1 "${NIFI_INTERNAL_HOST}":8443 ssl
>>
>>
>>
>> On Wed, Sep 6, 2023 at 9:40 AM Minh HUYNH  wrote:
>>
>>>
>>> Hello,
>>>
>>> I have been trying long time ago to configure nifi cluster behind the
>>> haproxy/loadbalancer
>>> But until now, it is always failed.
>>> I have only got access to the welcome page of nifi after all others
>>> links are failed.
>>>
>>> If someone has the configuration, it is helpfull.
>>>
>>> Thanks a lot
>>>
>>> Regards
>>>
>>
>>
>>
>
>
>


Re: NiFi not rolling logs

2023-07-08 Thread Jens M. Kofoed
Hi

Please have a look at this old jira:
https://issues.apache.org/jira/browse/NIFI-2203
I have had issues where a processor create a log message ever 10ms
resulting in the disk is being full. For me it seems like the maxHistory
settings only effect how many files defined by the rolling patten to be
kept. If you have defined it like this:
${org.apache.nifi.bootstrap.config.log.dir}/nifi-app%d{-MM-dd}.%i.log
MaxHistory only effect the days not the increments file %i per day. So you
can stille have thousands of files in one day.
The totalSizeCap will delete the oldes files if the total size hits the cap
settings.

The totalSizeCap have been added in the logback.xml file for nifi-registry
where it has been added inside the rollingPolicy section. I cound not get
it to work inside the rollingPolicy section in nifi but just added
in appender section. See my comment in the jira:
https://issues.apache.org/jira/browse/NIFI-2203

Kind regards
Jens M. Kofoed

Den lør. 8. jul. 2023 kl. 04.27 skrev Mike Thomsen :

> Yeah, I'm working through some of it where I have time. I plan to have a
> Jira up this weekend. I'm wondering, though, if we shouldn't consider a
> spike for switching to log4j2 in 2.X because I saw a lot of complaints
> about logback being inconsistent in honoring its settings.
>
> On Fri, Jul 7, 2023 at 10:19 PM Joe Witt  wrote:
>
>> H.  Interesting.  Can you capture these bits of fun in a jira?
>>
>> Thanks
>>
>> On Fri, Jul 7, 2023 at 7:17 PM Mike Thomsen 
>> wrote:
>>
>>> After doing some research, it appears that  is a wonky
>>> setting WRT how well it's honored by logback. I let a GenerateFlowFile >
>>> LogAttribute flow run for a long time, and it just kept filling up. When I
>>> added  that appeared to force expected behavior on total log
>>> size. We might want to add the following:
>>>
>>> true
>>> 50GB
>>>
>>> On Fri, Jul 7, 2023 at 11:33 AM Michael Moser 
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> You aren't alone in experiencing this.  I think logback uses a pattern
>>>> matcher on filename to discover files to delete.  If "something" happens
>>>> which causes a gap in the date pattern, then the matcher will then fail to
>>>> pick up and delete files on the other side of that gap.
>>>>
>>>> Regards,
>>>> -- Mike M
>>>>
>>>>
>>>> On Thu, Jul 6, 2023 at 10:28 AM Mike Thomsen 
>>>> wrote:
>>>>
>>>>> We are using the stock configuration, and have noticed that we have a
>>>>> lot of nifi-app* logs that are well beyond the historic data cap of 30 
>>>>> days
>>>>> in logback.xml; some of those logs go back to April. We also have a bunch
>>>>> of 0 byte nifi-user logs and some of the other logs are 0 bytes as well. 
>>>>> It
>>>>> looks like logback is rotating based on time, but isn't cleaning up. Is
>>>>> this expected behavior or a problem with the configuration?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike
>>>>>
>>>>


Re: Embedded Hazelcast Cachemanager

2023-02-22 Thread Jens M. Kofoed
Hi Isha

We are using Redis in a 3 node Redis Sentinel cluster for HA purpose. It
works fine

Kind regards
Jens M. Kofoed

Den ons. 22. feb. 2023 kl. 11.36 skrev Isha Lamboo <
isha.lam...@virtualsciences.nl>:

> Hi Simon,
>
> Thanks for your explanation. It will help me manage expectations with the
> team that developed the flow. We were hoping to do exactly as you suggest,
> drop in a redundant cache without the time and resource investment of
> setting up an external cluster like Redis or Hazelcast. And in fact, it
> runs fine on most days, but as currently set up it doesn't play nice when
> the load on the cluster gets too high or nodes disconnect.
>
> If I get the time to run some tests I'll share the results, but for now
> I'll advise the devs to accept a longer run and schedule the
> DetectDuplicate less often or to revert to using the
> DistributedMapCacheServer on a single node again. If neither is acceptable
> they can request an external cache service cluster.
>
> Thank you very much,
>
> Isha
>
> -Oorspronkelijk bericht-
> Van: Simon Bence 
> Verzonden: woensdag 22 februari 2023 10:47
> Aan: users@nifi.apache.org
> Onderwerp: Re: Embedded Hazelcast Cachemanager
>
> Hi Isha,
>
> Without deeper understanding of the situation I am not sure if the load
> comes from entirely this part of this given batch processing, but for the
> scope of the discussion I do assume this also with the assumption that this
> shows drastic contrast with the same measurements using DistributedMapCache
> as cache.
>
> The EmbeddedHazelcastCacheManager was primarily added for simpler
> scenarios as an out of the box solution, might be "grabbed to the canvas”
> without much fuss. As of this, it has very limited customisation
> capabilities. As your scenario looks to utilize Hazelcast heavily, this
> might not be the ideal tool. Also it is important to mention, that in case
> of the embedded approach, the Hazelcast instances running on the same
> server, thus they are adding to the load already produced by other parts of
> the flow.
>
> Using ExternalHazelcastCacheManager can provide much more flexibility: as
> it works with standalone Hazelcast instances, this approach opens the whole
> range of performance optimization capabilities of it. You can use either
> one single instance touched by all the nodes (which comes with no
> synchronization between Hazelcast nodes but might be a bottleneck at some
> point) or even build up a separate cluster. Of course, the results are
> highly depend on network topology and other factors specific for your use
> case.
>
> Also I am not sure the details of your flows or if you prefer processing
> time over throughput or not, but it is also a possible optimization
> opportunity to distribute the batch in time resulting smaller peaks.
>
> Best regards,
> Bence
>
>
> > On 2023. Feb 21., at 21:45, Isha Lamboo 
> wrote:
> >
> > Hi Simon,
> >
> > The Hazelcast cache is being used by a DetectDuplicate processor to
> cache and eliminate message ids. These arrive in large daily batches with
> 300-500k messages, most (90+%) of which are actually duplicates. This was
> previously done with a DistributedMapCacheServer, but that involved using
> only one of the nodes (hardcoded in the MapCacheClient controller), giving
> us a single point of failure for the flow. We had hoped to use Hazelcast to
> have a redundant cacheserver, but I’m starting to think that this scenario
> causes too many concurrent updates of the cache, on top of the already
> heavy load from other processing on the batch.
> >
> > What was new to me is the CPU load on the cluster in question going
> through the roof, on all 3 nodes. I have no idea how a 16 vCPU server gets
> to a load of 100+.
> >
> > The start roughly coincides with the arrival of the daily batch, though
> there may have been other batch processes going on since it’s a Sunday.
> However, the queues were pretty much empty again in an hour and yet the
> craziness kept going until I finally decided to restart all nodes.
> > 
> >
> > The hazelcast troubles might well be a side-effect of the NiFi servers
> being overloaded. There could have been issues at the Azure VM level etc.
> But activating the Hazelcast controller is the only change I *know* about.
> And it doesn’t seem farfetched that it got into a loop trying to
> migrate/copy partitions “lost” on other nodes.
> >
> > I’ve attached a file with selected hazelcast warnings and errors from
> the nifi-app.log files, trying to include as many unique ones as possible.
> >
> > The errors that kept repeating where these (always together):
> >
> > 2

Re: Processor with cron scheduling in middle of flow

2023-02-22 Thread Jens M. Kofoed
Hi Mark
We have many List/Get processors which is running via cron. Some systems
export data to disk every hour, but the systems can't block read acces to
the files while writing them. So NiFi can pull the same file multiple times
and tries to delete it while the file is written. But we know that the
export only takes 10 minutes. Therefore we use a CRON to get files between
0 0 15-55 * *
We have similar issues with other systems only providing data or are
accessibly at specific time slots.

To John:
Could you use a Notify/Wait gate function. Where a wait processor is
blocking flowfiles to the mergeContent processor. And in another flow use a
generateFlowfile and a notify process to open the gate (wait process).
After the mergeContent you could have a notify process to close the gate
again.
In this way, you would get many flowfile into the mergeContent process at
the same time.

Kind regards
Jens M. Kofoed


Den 22. feb. 2023 kl. 15.24 skrev Mark Payne :

Interesting. Thanks for that feedback Harald. It might make sense to be
more surgical about this, disabling it for MergeContent, for example,
instead of all interflow processors.

Thanks
-Mark


On Feb 22, 2023, at 5:42 AM, Dobbernack, Harald (Key-Work) <
harald.dobbern...@key-work.de> wrote:


Just responding to this part:

You should not be using CRON driven for any processors in the middle of a
flow. In fact, we really

should probably just disable that all together.

Please don't disable this! We actually use CRON for some of our PutSFTP
Processors as there are servicetimes of these SFTP we are supposed to
respect and not use them or the SFTP will actually not be available... Of
course we can also use a routing to a wait processor if we have arrived at
a time where the destination should not be called, but it is so more
simpler being able to tell the processor in the middle of the flow when not
to run.


-Ursprüngliche Nachricht-

Von: Mark Payne 

Gesendet: Dienstag, 21. Februar 2023 21:37

An: users@nifi.apache.org; John McGinn 

Betreff: Re: Processor with cron scheduling in middle of flow


Key-Work IT-Sicherheit: Es handelt sich um eine externe E-Mail. Bitte nur
auf Links oder Anhänge klicken, sofern die Echtheit der Nachricht klar ist.


John,


You should not be using CRON driven for any processors in the middle of a
flow. In fact, we really should probably just disable that all together.

In fact, it’s exceedingly rare that you’d want anything other than
Timer-Driven with a Run Schedule of 0 sec.

MergeContent will not create any merged output on its first iteration after
it’s scheduled to run. It requires at least a second iteration before
anything is transferred. Its algorithm has evolved over time, and it may
well have happened to work previously but it’s really not being configured
as intended.


When you say that you’re retrieving data from a few sources and then
“merges that all back into a single file” - does that mean that you started
with one FlowFile, split it up, and then want to re-assemble the data after
performing enrichment? If so you’ll want to use a Merge Strategy of
Defragment.


If you’re trying to just bring in some data and merge it together by
correlation attribute, then Bin Packing makes sense. Here, you have a few
properties that you can use to try to get the best bin packing. In short, a
bin will be created when any of these conditions is met:


- The Minimum Group Size is reached AND the Minimum Number of Entries is met

- The Maximum Group Size OR the Maximum Number of Entries is met

- A bin has been sitting for “Max Bin Age” amount of time

- If a correlation attribute is used, and a FlowFile comes in that can’t go
into any bin, it will evict the oldest.


If you’re seeing bins smaller than expected, you can look at the Data
Provenance for the merged FlowFile, and it will tell you exactly which of
the conditions above triggered the data to be merged. This may help to
adjust these settings.


Hope this is helpful.


Thanks

-Mark



On Feb 17, 2023, at 1:39 PM, John McGinn via users 
wrote:


Hello,


NiFi 1.19.0 - I need some help in trying to make my idea work, or figure
out the better way to do this.


I've got a flow that retrieves data from a few data sources, enhances
individual flow files, converts attributes to CSV and then merges that all
back into a single file. It takes roughly 20 minutes for the process to run
from start to the MergeContent part, so when I do it manually, I stop the
MergeContent processor until all flowfiles are in the queue waiting, and
then I start the MergeContent processor. (Run One Time doesn't work for
some reason.) That works fine, manually.


When I try to put cron scheduling in, it never kicks off. For instance, the
initial processor in the flow has a cron schedule of the top of the hour.
(0 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?).
When I start the flow, the flowfiles are generated and queue up in front of
Merge

How to export the "Flow Configuration History" logs to external systems?

2023-02-09 Thread Jens M. Kofoed
Hi

I'm trying to find the information about flow changes in the log files. But
not with the best success :-(
I can find part of information in the different log files. But I'm not able
to decode or find all information about users starting/stopping processors,
or making changes.

Would it be possible to create a log, which include the same information
which is in the Flow Configuration History log internal in the gui?

Kind regards
Jens M. Kofoed


Re: CSV with header - but always, even for 0 record flowfiles

2022-12-20 Thread Jens M. Kofoed
As a workarounds I have been using a CalculateRecordStats and a
RouteOnAttribute to route files with 0 records and only fix those 0 records
files so it doesn't influence all the correct files

Kind regards
Jens M. Kofoed

Den 19. dec. 2022 kl. 17.40 skrev Joe Witt :


Josef

Please file a JIRA with this information above. Makes sense what you're
looking for.  Just not sure where this 'concern' would live whether it is
in the processors themselves or the controller services for the writers.

Thanks

On Mon, Dec 19, 2022 at 9:21 AM  wrote:

> Hi guys
>
>
>
> I’ve pretty basic question. We use a “ConvertRecord” processor where we
> convert an AVRO to a CSV. For that CSV output we would like to have the
> header enabled, so we tried to set “Include Header Line – true” for the
> Controller Service of the CSVRecordSetWriter. The issue is, if we have zero
> records, the header doesn’t show up (but it was there of course in the AVRO
> file). We need to have it as the columns are important for us, even if we
> have 0 records. At the moment we solve it with an extra ExecuteScript
> processor just before the ConvertRecord, there we add always an extra
> record with the header lines as string. But it feels a bit hacky as the
> record.count attribute is 1 record too high (due to the fake header
> record). Has anybody another easy approach :-)?
>
>
>
> Cheers Josef
>
>
>
>
>
> 
>
>


Re: nifi-api with a server secured with Microsoft AD

2022-11-01 Thread Jens M. Kofoed
Hi David

It's also possible to configure authorizers.xml to both handle LDAP and
local users (file-access) so you can have both. It's using
composite-configurable-user-group-provider. Just remember that nifi is case
sentitive, so the what you specify as the user, should match exactly what
nifi sees in the certificate. Some certificates are created with a space
after commas like ", ou=", and others don'ts ",ou=".
I'm using this where all the regular users are from AD, but you can still
create local users which is using a certificate.
from authorizers.xml:


file-user-group-provider
org.apache.nifi.authorization.FileUserGroupProvider
./conf/localAuth/users.xml

cn=name, ou=users, dc=foo,
dc=bar


ldap-user-group-provider
org.apache.nifi.ldap.tenants.LdapUserGroupProvider
...


composite-configurable-user-group-provider

org.apache.nifi.authorization.CompositeConfigurableUserGroupProvider
file-user-group-provider
ldap-user-group-provider


file-access-policy-provider

org.apache.nifi.authorization.FileAccessPolicyProvider
composite-configurable-user-group-provider
./conf/localAuth/authorizations.xml
cn=name, ou=users, dc=foo,
dc=bar


NIFI-CLUSTER01-NODES


managed-authorizer

org.apache.nifi.authorization.StandardManagedAuthorizer
file-access-policy-provider



>From nifi.properties:
nifi.security.user.authorizer=managed-authorizer


In this way, you can create certificates your self, from the CA you used to
create the certificates for nifi. and create these as internal local users
and still use AD.

Kind regards
Jens M. Kofoed


Den tir. 1. nov. 2022 kl. 16.47 skrev David Early via users <
users@nifi.apache.org>:

> Mike and Shawn,  thanks for the feedback, have not had a chance to try
> either, but appreciate your help.  Will be trying the cert this week, will
> reach out to the AD managers about a more direct AD solution.
>
> Dave
>
> On Sat, Oct 29, 2022 at 7:10 PM Mike Thomsen 
> wrote:
>
>> David,
>>
>> Another option you might want to explore is having AD generate client
>> certificates for your users.
>>
>> On Sat, Oct 29, 2022 at 12:01 PM Shawn Weeks 
>> wrote:
>> >
>> > NiFi should always accept a cert at the rest api if you provide one. If
>> your using curl just add the “--key” and “--cert” and call whatever api url
>> your trying directly. You’ll need to make sure that the cert your using is
>> signed by the same local CA that NiFi is set to trust and that you’ve added
>> a user in NiFi that matches the common name on the cert or whatever regex
>> you set for “nifi.security.identity.mapping.value.pattern”
>> >
>> > Thanks
>> > Shawn
>> >
>> > > On Oct 28, 2022, at 3:55 PM, David Early via users <
>> users@nifi.apache.org> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > We have a 3 node cluster secured with Microsort AD for the first time.
>> > >
>> > > I need access to the REST api.  The nifi-api/access/token does not
>> work in this case.
>> > >
>> > > We did use a local CA for certificate generation on the servers.
>> > >
>> > > I am reading that it is possible to do certificate based auth to the
>> apiwe need this in a script (python) to run on a remote server which is
>> checking for old flowfiles that can get stuck in a few places.
>> > >
>> > > Can I use cert based API connection when using AD as the main
>> authentication/authorization for the ui?
>> > >
>> > > Anything special that needs to be done?  I've just not used certs
>> with the api before, but we have used cert based site to site on other
>> systems and it works fine.  Just not sure how to do it with nipyapi or just
>> from curl on the cli.
>> > >
>> > > David
>> >
>>
>
>
> --
> David Early, Ph.D.
> david.ea...@grokstream.com
> 720-470-7460 Cell
>
>


Re: Need help to merge all records in cluster into one flowfile

2022-08-31 Thread Jens M. Kofoed
Hi Chris and Mark

Many thanks for your reply. You are totally right (of course :-) ) and that
is also the knowledge and understanding I had (have). except that the EL
only will be evaluated using variable registry. Sorry for that :-)

My goal is to only have one flow file, with all records from the
SiteToSiteStatusReportingTask. In my following flow, I'm checking for
backpressures and queues and create triggers to other systems. If 2 or more
flow files are processed after each other, and  backpressures issue in the
first flow file will be overwritten with an OK from the next flow file.
That's why I will merge them.

To debug what is going on, I create the report every minutes by cron (* /1
* * *) and most of the time (more than 90%) all 3 flow files are merged
into 1 flow file because the Minimum Number of Records is reached AND
Minimum Bin Size is reached. And the 3 flow files are merged within
milliseconds. So the processor does exactly what it should do.
The issues is that some times is doesn't merged and times out and hitting
the "Max Bin Age". The 3 flow files have exactly the same amount of records
and the size of course very depending of the amount of data in queues
across the hole flow.
With the Minimum Bin Size set to 0B it should only be the "Minimum Number
of Records" which came in action. And as I wrote above it works great, must
of the time.
If I change the SiteToSiteStatusReportingTask to create reports with a
lower batch size, each node will create more flow files. And with 9 flow
files at the input port, which together have exactly the same amount of
records and size as 3 flow files. It will never bin all together in one
file. If I change the Max Bin age, it just take longer for the process to
"time out".

So my issues are more why 3 flow files will merge 90+% of the time and not
all the time, since the amount of records are the same? That is
what worries me.

But thanks Mark, I will give the MergeContent a try and fix the json array
so it doesn't break afterwards in the following flow.

But many thanks again to both of you.

kind regards
Jens M. Kofoed



Den ons. 31. aug. 2022 kl. 16.07 skrev Mark Payne :

> Thanks Chris. That’s exactly right.
>
> Given that you’re seeing the Max Bin Age is the cause, the solution would
> be to increase the max bin age if you want fewer FlowFiles.
>
> The data is merged when any one of the following conditions is met:
>
> - Minimum Number of Records is reached AND Minimum Bin Size is reached
> OR
> - Maximum Number of Records is reached OR Maximum Bin Size is reached
> OR
> - Max Bin Age is reached
> OR
> - Maximum Number of Bins is reached AND a new FlowFile is encountered that
> belongs in a different bin than any of the existing ones (only valid if
> using a Correlation Attribute).
>
> So in your case, you’re not hitting the minimum number of records, but you
> are hitting the Max Bin Age so it’s merging.
> The idea behind Max Bin Age is that it’s basically a timeout. It prevents
> data from stacking up for too long, introducing too large of a latency.
>
> Now, that said, what you’re after is really not something that’s as easily
> supported by this Processor. Becuase you’re not really looking to pack
> together Records in order to build a larger bundle. You’re looking to pack
> together records in order to re-join specific sets of Records. So you might
> actually want to consider using MergeContent instead of MergeRecord.
> Assuming that your data is in JSON format, you can use MergeRecord’s
> header/footer/demarcator properties to ensure that you still have valid
> JSON. But with MergeRecord you specify min/max based on number of
> FlowFiles, not number of Records. So you can set Minimum Entries to 3
> (assuming you have 3 nodes in your cluster). So that’ll wait for 3
> FlowFiles. Presumably one from each node.And set a Max Bin Age short enough
> that even if a node doesn’t send because the node is stopped, you still
> merge data from the other 2 nodes or whatever.
>
> Thanks
> -Mark
>
>
>
> On Aug 31, 2022, at 7:45 AM, Chris Sampson 
> wrote:
>
> For “Minimum Number of Records”, the docs [1] indicate that the field does
> support Expression Language but "will be evaluated using variable
> registry only”, i.e. it doesn’t use FlowFile attributes, which it appears
> you’re trying to do in your example within this email chain.
>
> If you provenance is showing that "Records Merged due to: Bin has reached
> Max Bin Age”, wouldn’t it be a good idea to increase the “Max Bin Age” from
> the “10s” you indicate in your original email? If you set this to, say,
> “5mins”, do you see the number of resultant FlowFiles reduce with more
> input Records included within each output FlowFile?
>
> Basically, your provenance seems to sugge

Re: Need help to merge all records in cluster into one flowfile

2022-08-31 Thread Jens M. Kofoed
Hi
By degreasing the batch size for the SiteToSiteStatusReportingTask I get
even more flowfiles. So just for testing I now have total of 9 files
(2.75MB) in the incomming queue to the mergeRecord.
The total number of records above 2000, so I have set the "Minimum Number
of Records" to 1500 and the "Minimum Bin Size" to 2 MB.
The result are 3 flowfiles which are all have "Records Merged due to: Bin
has reached Max Bin Age" Why?
All 9 files should be merged into one file, since the total amount of
records exceeds the minimum,

Kind regards
Jens M. Kofoed

Den ons. 31. aug. 2022 kl. 09.50 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Hey Mark
>
> I tried another idea to dynamically set the "Minimum Number of Records" by
> EL. Editing the field it says that EL is supported, so I tried this:
> ${record.count:minus(1):multiply(3)}
>
> But the processor does not like this:
> Perform Validation
> nifi.mydomain.com:8443 - Component is invalid: 'Component' is invalid
> because Failed to perform validation due to
> java.lang.NumberFormatException: For input string: ""
>
> I got the same error if I just tried to set the EL to: ${record.count}
>
> Is this a bug???
>
> Kind regards
> Jens
>
>
> Den ons. 31. aug. 2022 kl. 09.24 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> Hey Mark
>>
>> Many thanks for your reply. But it's in fact the Details field which does
>> not help me.
>> At 08:16:00 all 3 nodes generate a SiteToSiteStatusReport.
>> At 08:16:11.003 the MergeRecords have a JOIN event. Joining 2 files:
>> "Records Merged due to: Bin has reached Max Bin Age"
>> At 08:16:11.008 the MergeRecords have another JOIN event. Joining 1 file:
>> "Records Merged due to: Bin has reached Max Bin Age"
>>
>> So one file is 0.005s younger than the other 2 files and therefore is not
>> merged into the first bin of files. But how can we force all flowfiles to
>> be merged into one flowfile?
>> If I set the minimum file size or records to be within range of the >2
>> files and <3 files, it will trigger a merge. But when we create more flows
>> the records and filesize will increase and we will be back to the problem
>> that not all files will be merged into one.
>>
>> kind regards
>> Jens
>>
>> Den tir. 30. aug. 2022 kl. 15.40 skrev Mark Payne :
>>
>>> Hey Jens,
>>>
>>> My recommendation is to take a look at the data provenance for
>>> MergeRecord (i.e., right-click on the Processor and go to Data Provenance.)
>>> Click the little ‘i’ icon on the left for one of the JOIN events.
>>> There, it will show a “Details” field, which will tell you why it merged
>>> the data in the bin.
>>> Once you understand why it’s merging the data with only 2 FlowFiles, you
>>> should be to understand how to adjust your configuration to avoid doing
>>> that.
>>>
>>> Thanks
>>> -Mark
>>>
>>>
>>> > On Aug 30, 2022, at 2:31 AM, Jens M. Kofoed <
>>> jmkofoed.ube+n...@gmail.com> wrote:
>>> >
>>> > Hi all
>>> >
>>> > I'm running a 3 node cluster at version 1.16.2. I'm using the
>>> SiteToSiteStatusReportingTask to monitor and check for any backpressures or
>>> queues. I'm trying to merge all 3 reports into 1, but must of the times I
>>> always get 2 flowfile after my MergeRecord.
>>> >
>>> > To be sure the nodes are creating the reports at the same time the
>>> SiteToSiteStatusReportingTask is set to schedule via CRON driver every 5
>>> mins.
>>> > The connection from the input port to the next process is set with
>>> "Load Balance Strategy" to Single node, to be sure all 3 reports are at the
>>> same node.
>>> > In my MergeRecord the "Correlation Attribute Name" is set to
>>> "reporting.task.uuid" which is the same for all 3 flowfiles.
>>> > "Minimum Number of Records" is set to 5000, which is much higher than
>>> the total amounts of records.
>>> > "Minimum Bin Size" is set to 5 MB, which is also much higher than the
>>> total size. Maximum "Number of Bins" is at default: 10
>>> > "Max Bin Age" is set to 10 s.
>>> >
>>> > With these setting I was hoping that all 3 reports, should be at the
>>> same node within a few seconds. And that the mergeRecods will merge all 3
>>> flowfiles into 1. But many time the mergeRecord outputs 2 flowfiles.
>>> >
>>> > Any ideas how to force all into one flowfile.
>>> >
>>> > Kind regards
>>> > Jens M. Kofoed
>>>
>>>


Re: Need help to merge all records in cluster into one flowfile

2022-08-31 Thread Jens M. Kofoed
Hey Mark

I tried another idea to dynamically set the "Minimum Number of Records" by
EL. Editing the field it says that EL is supported, so I tried this:
${record.count:minus(1):multiply(3)}

But the processor does not like this:
Perform Validation
nifi.mydomain.com:8443 - Component is invalid: 'Component' is invalid
because Failed to perform validation due to
java.lang.NumberFormatException: For input string: ""

I got the same error if I just tried to set the EL to: ${record.count}

Is this a bug???

Kind regards
Jens


Den ons. 31. aug. 2022 kl. 09.24 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Hey Mark
>
> Many thanks for your reply. But it's in fact the Details field which does
> not help me.
> At 08:16:00 all 3 nodes generate a SiteToSiteStatusReport.
> At 08:16:11.003 the MergeRecords have a JOIN event. Joining 2 files:
> "Records Merged due to: Bin has reached Max Bin Age"
> At 08:16:11.008 the MergeRecords have another JOIN event. Joining 1 file:
> "Records Merged due to: Bin has reached Max Bin Age"
>
> So one file is 0.005s younger than the other 2 files and therefore is not
> merged into the first bin of files. But how can we force all flowfiles to
> be merged into one flowfile?
> If I set the minimum file size or records to be within range of the >2
> files and <3 files, it will trigger a merge. But when we create more flows
> the records and filesize will increase and we will be back to the problem
> that not all files will be merged into one.
>
> kind regards
> Jens
>
> Den tir. 30. aug. 2022 kl. 15.40 skrev Mark Payne :
>
>> Hey Jens,
>>
>> My recommendation is to take a look at the data provenance for
>> MergeRecord (i.e., right-click on the Processor and go to Data Provenance.)
>> Click the little ‘i’ icon on the left for one of the JOIN events.
>> There, it will show a “Details” field, which will tell you why it merged
>> the data in the bin.
>> Once you understand why it’s merging the data with only 2 FlowFiles, you
>> should be to understand how to adjust your configuration to avoid doing
>> that.
>>
>> Thanks
>> -Mark
>>
>>
>> > On Aug 30, 2022, at 2:31 AM, Jens M. Kofoed <
>> jmkofoed.ube+n...@gmail.com> wrote:
>> >
>> > Hi all
>> >
>> > I'm running a 3 node cluster at version 1.16.2. I'm using the
>> SiteToSiteStatusReportingTask to monitor and check for any backpressures or
>> queues. I'm trying to merge all 3 reports into 1, but must of the times I
>> always get 2 flowfile after my MergeRecord.
>> >
>> > To be sure the nodes are creating the reports at the same time the
>> SiteToSiteStatusReportingTask is set to schedule via CRON driver every 5
>> mins.
>> > The connection from the input port to the next process is set with
>> "Load Balance Strategy" to Single node, to be sure all 3 reports are at the
>> same node.
>> > In my MergeRecord the "Correlation Attribute Name" is set to
>> "reporting.task.uuid" which is the same for all 3 flowfiles.
>> > "Minimum Number of Records" is set to 5000, which is much higher than
>> the total amounts of records.
>> > "Minimum Bin Size" is set to 5 MB, which is also much higher than the
>> total size. Maximum "Number of Bins" is at default: 10
>> > "Max Bin Age" is set to 10 s.
>> >
>> > With these setting I was hoping that all 3 reports, should be at the
>> same node within a few seconds. And that the mergeRecods will merge all 3
>> flowfiles into 1. But many time the mergeRecord outputs 2 flowfiles.
>> >
>> > Any ideas how to force all into one flowfile.
>> >
>> > Kind regards
>> > Jens M. Kofoed
>>
>>


Re: Need help to merge all records in cluster into one flowfile

2022-08-31 Thread Jens M. Kofoed
Hey Mark

Many thanks for your reply. But it's in fact the Details field which does
not help me.
At 08:16:00 all 3 nodes generate a SiteToSiteStatusReport.
At 08:16:11.003 the MergeRecords have a JOIN event. Joining 2 files:
"Records Merged due to: Bin has reached Max Bin Age"
At 08:16:11.008 the MergeRecords have another JOIN event. Joining 1 file:
"Records Merged due to: Bin has reached Max Bin Age"

So one file is 0.005s younger than the other 2 files and therefore is not
merged into the first bin of files. But how can we force all flowfiles to
be merged into one flowfile?
If I set the minimum file size or records to be within range of the >2
files and <3 files, it will trigger a merge. But when we create more flows
the records and filesize will increase and we will be back to the problem
that not all files will be merged into one.

kind regards
Jens

Den tir. 30. aug. 2022 kl. 15.40 skrev Mark Payne :

> Hey Jens,
>
> My recommendation is to take a look at the data provenance for MergeRecord
> (i.e., right-click on the Processor and go to Data Provenance.) Click the
> little ‘i’ icon on the left for one of the JOIN events.
> There, it will show a “Details” field, which will tell you why it merged
> the data in the bin.
> Once you understand why it’s merging the data with only 2 FlowFiles, you
> should be to understand how to adjust your configuration to avoid doing
> that.
>
> Thanks
> -Mark
>
>
> > On Aug 30, 2022, at 2:31 AM, Jens M. Kofoed 
> wrote:
> >
> > Hi all
> >
> > I'm running a 3 node cluster at version 1.16.2. I'm using the
> SiteToSiteStatusReportingTask to monitor and check for any backpressures or
> queues. I'm trying to merge all 3 reports into 1, but must of the times I
> always get 2 flowfile after my MergeRecord.
> >
> > To be sure the nodes are creating the reports at the same time the
> SiteToSiteStatusReportingTask is set to schedule via CRON driver every 5
> mins.
> > The connection from the input port to the next process is set with "Load
> Balance Strategy" to Single node, to be sure all 3 reports are at the same
> node.
> > In my MergeRecord the "Correlation Attribute Name" is set to
> "reporting.task.uuid" which is the same for all 3 flowfiles.
> > "Minimum Number of Records" is set to 5000, which is much higher than
> the total amounts of records.
> > "Minimum Bin Size" is set to 5 MB, which is also much higher than the
> total size. Maximum "Number of Bins" is at default: 10
> > "Max Bin Age" is set to 10 s.
> >
> > With these setting I was hoping that all 3 reports, should be at the
> same node within a few seconds. And that the mergeRecods will merge all 3
> flowfiles into 1. But many time the mergeRecord outputs 2 flowfiles.
> >
> > Any ideas how to force all into one flowfile.
> >
> > Kind regards
> > Jens M. Kofoed
>
>


Need help to merge all records in cluster into one flowfile

2022-08-29 Thread Jens M. Kofoed
Hi all

I'm running a 3 node cluster at version 1.16.2. I'm using
the SiteToSiteStatusReportingTask to monitor and check for any
backpressures or queues. I'm trying to merge all 3 reports into 1, but must
of the times I always get 2 flowfile after my MergeRecord.

To be sure the nodes are creating the reports at the same time the
SiteToSiteStatusReportingTask is set to schedule via CRON driver every 5
mins.
The connection from the input port to the next process is set with "Load
Balance Strategy" to Single node, to be sure all 3 reports are at the same
node.
In my MergeRecord the "Correlation Attribute Name" is set to
"reporting.task.uuid" which is the same for all 3 flowfiles.
"Minimum Number of Records" is set to 5000, which is much higher than the
total amounts of records.
"Minimum Bin Size" is set to 5 MB, which is also much higher than the total
size. Maximum "Number of Bins" is at default: 10
"Max Bin Age" is set to 10 s.

With these setting I was hoping that all 3 reports, should be at the same
node within a few seconds. And that the mergeRecods will merge all 3
flowfiles into 1. But many time the mergeRecord outputs 2 flowfiles.

Any ideas how to force all into one flowfile.

Kind regards
Jens M. Kofoed


StandardOauth2AccessTokenProvider gets "token not active"

2022-08-29 Thread Jens M. Kofoed
Hi community

I'm using the StandardOauth2AccessTokenProvider to get and refresh a token,
which works great. But almost at every refresh, one of the nodes in the
cluster gets this error. It's not the same node which gets the error every
time, all nodes gets it but only one node at a time.

2022-08-29 06:14:28,081 ERROR [Timer-Driven Process Thread-4]
org.apache.nifi.oauth2.StandardOauth2AccessTokenProvider
StandardOauth2AccessTokenProvider[id=861dbfea-0181-1000--d19b4cf0]
OAuth2 access token request failed [HTTP 400], response:
{"error":"invalid_grant","error_description":"Token is not active"}
2022-08-29 06:14:28,082 INFO [Timer-Driven Process Thread-4]
org.apache.nifi.oauth2.StandardOauth2AccessTokenProvider
StandardOauth2AccessTokenProvider[id=861dbfea-0181-1000--d19b4cf0]
Refresh Access Token request failed [
https://foo.bar/auth/realms/myrealm/protocol/openid-connect/token]
org.apache.nifi.processor.exception.ProcessException: OAuth2 access token
request failed [HTTP 400]
at
org.apache.nifi.oauth2.StandardOauth2AccessTokenProvider.getAccessDetails(StandardOauth2AccessTokenProvider.java:327)
at
org.apache.nifi.oauth2.StandardOauth2AccessTokenProvider.refreshAccessDetails(StandardOauth2AccessTokenProvider.java:315)
at
org.apache.nifi.oauth2.StandardOauth2AccessTokenProvider.getAccessDetails(StandardOauth2AccessTokenProvider.java:249)
at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:254)
at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:105)
at com.sun.proxy.$Proxy183.getAccessDetails(Unknown Source)
at
org.apache.nifi.processors.standard.InvokeHTTP.lambda$configureRequest$3(InvokeHTTP.java:1108)
at java.util.Optional.ifPresent(Optional.java:159)
at
org.apache.nifi.processors.standard.InvokeHTTP.configureRequest(InvokeHTTP.java:1107)
at
org.apache.nifi.processors.standard.InvokeHTTP.onTrigger(InvokeHTTP.java:927)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1283)
at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:103)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

I can't find any information in the log, when the process successful
refresh the token. So I can't see if all nodes in the cluster is refreshing
the token at the same time, or if it's only the primary nodes which
refresh. If all nodes are refreshing could it be that one nodes is slower
than the others to refresh, and that the old tokens gets invalid after the
first node has refreshed it?

Kind regards
Jens M. Kofoed


Re: Load balancing just stopped working in NIFI 1.16.1

2022-05-19 Thread Jens M. Kofoed
Hi Mark

Thanks for the reply. The “funny” part is that it worked before and I haven’t 
made any change to the configuration. 
And I have deleted the state folder at all nodes and it doesn’t work. 

It is the same configuration as 3 other clusters except hostname/up adresses 
and certificates off course. 
All clusters are running 1.16.1. At the 3 other clusters don’t have any issues 
(yet).
It all started after node 1 went down do to memory problem. After that they 
just don’t like me anymore 🤔

Tomorrow I will give it a chance and if it doesn’t work I will reinstall the 
cluster from scratch. I would like to be able to fix it. If the same situations 
happened with one of the productions clusters. Hopefully all flows are in the 
registry.

Kind regards 
Jens 

> Den 19. maj 2022 kl. 15.21 skrev Mark Payne :
> 
> Jens,
> 
> So that would tell us that the hostname or the port is wrong, or that NiFi is 
> not running/listening for load balanced connections. I would recommend the 
> following:
> 
> - Check nifi.properties on all nodes to make sure that the 
> nifi.cluster.load.balance.host property is set to the node’s hostname.
> - You could try deleting the state/ directory on the nodes and restarting. 
> Perhaps somehow it’s got an old value cached and is trying to communicate 
> with the wrong port?
> 
> Thanks
> -Mark
> 
> 
>> On May 19, 2022, at 7:42 AM, Jens M. Kofoed  wrote:
>> 
>> Hi
>> 
>> I have a 3 node test cluster running v. 1.16.1. Which has been working fine, 
>> with no errors. But i doesn't do much, since it is my test cluster.
>> But now I am struggling with load balance connection refuse between nodes.
>> Both node 2 and 3 are refusing load balancing connections, even after 
>> stopping the cluster. I have deleted all files at node 2 and 3, but they 
>> still refuse connection.
>> I have not made any change to the configuration, but I have created a new 
>> flow which got node 1 to die.
>> The new flow is using an ExecuteStreamCommand using stdin/stdout to a shell 
>> command to manipulate with some data (1 GB files). I set the "Concurrent 
>> Tasks" to high so node 1 ran out of memory and stopped. It continues to stop 
>> if I tried to start it again. So I deleted the flow.gz file and the run, 
>> state and work folders and started the node. Now node 1 was running again. 
>> But after this node 2 and 3 are refusing load balancing connections.
>> I can't see what this flow have to do with this, but I have now tried to 
>> remove all files at node 2 and 3 to get clean nodes. But they still doesn't 
>> work
>> 
>> Does any one have an idea how to debug further?
>> 
>> From the log:
>> 2022-05-19 13:22:46,268 ERROR [Load-Balanced Client Thread-2] 
>> o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to 
>> nifi-n03:8443 for load balancing
>> java.net.ConnectException: Connection refused
>>at sun.nio.ch.Net.connect0(Native Method)
>>at sun.nio.ch.Net.connect(Net.java:482)
>>at sun.nio.ch.Net.connect(Net.java:474)
>>at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
>>at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:107)
>>at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
>>at 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.createChannel(NioAsyncLoadBalanceClient.java:497)
>>at 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.establishConnection(NioAsyncLoadBalanceClient.java:440)
>>at 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.communicate(NioAsyncLoadBalanceClient.java:234)
>>at 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:81)
>>at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
>>at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>at java.lang.Thread.run(Thread.java:748)
>> 
>> kind regards
>> Jens
> 


Load balancing just stopped working in NIFI 1.16.1

2022-05-19 Thread Jens M. Kofoed
Hi

I have a 3 node test cluster running v. 1.16.1. Which has been working
fine, with no errors. But i doesn't do much, since it is my test cluster.
But now I am struggling with load balance connection refuse between nodes.
Both node 2 and 3 are refusing load balancing connections, even after
stopping the cluster. I have deleted all files at node 2 and 3, but they
still refuse connection.
I have not made any change to the configuration, but I have created a new
flow which got node 1 to die.
The new flow is using an ExecuteStreamCommand using stdin/stdout to a shell
command to manipulate with some data (1 GB files). I set the "Concurrent
Tasks" to high so node 1 ran out of memory and stopped. It continues to
stop if I tried to start it again. So I deleted the flow.gz file and the
run, state and work folders and started the node. Now node 1 was running
again. But after this node 2 and 3 are refusing load balancing connections.
I can't see what this flow have to do with this, but I have now tried to
remove all files at node 2 and 3 to get clean nodes. But they still doesn't
work

Does any one have an idea how to debug further?

>From the log:
2022-05-19 13:22:46,268 ERROR [Load-Balanced Client Thread-2]
o.a.n.c.q.c.c.a.n.NioAsyncLoadBalanceClient Unable to connect to
nifi-n03:8443 for load balancing
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:482)
at sun.nio.ch.Net.connect(Net.java:474)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:107)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:92)
at
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.createChannel(NioAsyncLoadBalanceClient.java:497)
at
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.establishConnection(NioAsyncLoadBalanceClient.java:440)
at
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.communicate(NioAsyncLoadBalanceClient.java:234)
at
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:81)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

 kind regards
Jens


Re: sftp processors start giving SSH_MSG_UNIMPLEMENTED errors after moving to v. 1.16.1

2022-05-04 Thread Jens M. Kofoed
Hi David

A JIRA ticket have been made:
https://issues.apache.org/jira/browse/NIFI-9989
including a track trace.

If more information is needed, please say so :-)

Kind regards
Jens M. Kofoed


Den ons. 4. maj 2022 kl. 14.29 skrev David Handermann <
exceptionfact...@apache.org>:

> Hi Jens,
>
> Thanks for reporting this problem, can you open a Jira issue?
>
> https://issues.apache.org/jira/browse/NIFI
>
> If you can provide a full stack trace, that would be helpful. It would
> also be helpful to know what strategy you are using for authentication: are
> you using a private key or a password?
>
> Setting the log level to DEBUG for the net.schmizz.sshj logger should also
> provide additional details.
>
> Regards,
> David Handermann
>
> On Wed, May 4, 2022 at 2:46 AM Jens M. Kofoed 
> wrote:
>
>> Hi
>>
>> After migrating out flow from a single instance running version 1.13.2
>> to  a 3 node cluster running 1.16.1 many of our sftp processors begin to
>> gives this error: net.schmizz.sshj.transport.TransportException: Received
>> SSH_MSG_UNIMPLEMENTED
>>
>> Servers are running Ubuntu 20.04, both NIFI and sftp servers.
>>
>> Any idea why?
>> I have tried to debug it, but have not been able to find any reasons.
>> I've changed the processors to only have 1 concurrent task instead of old
>> settings with 3-5 since we also change from single instance to cluster.
>>
>> Kind regards
>> Jens M. Kofoed
>>
>


sftp processors start giving SSH_MSG_UNIMPLEMENTED errors after moving to v. 1.16.1

2022-05-04 Thread Jens M. Kofoed
Hi

After migrating out flow from a single instance running version 1.13.2 to
a 3 node cluster running 1.16.1 many of our sftp processors begin to gives
this error: net.schmizz.sshj.transport.TransportException: Received
SSH_MSG_UNIMPLEMENTED

Servers are running Ubuntu 20.04, both NIFI and sftp servers.

Any idea why?
I have tried to debug it, but have not been able to find any reasons. I've
changed the processors to only have 1 concurrent task instead of old
settings with 3-5 since we also change from single instance to cluster.

Kind regards
Jens M. Kofoed


ListFTP gets NullPointerException with files in DaylightSavingTime

2022-03-27 Thread Jens M. Kofoed
We are using a MS IIS FTP server, which has files with last modification
timestamp in the hour up to where DST is switch. eg 01:03 GTM+1

After 02:00 the server switch to DST, but an error with MS FTP shows all
files with new timestamps as DST. So file which was original 01:03 GMT+1
are now listed as 02:03.

This produce a NPE in the ListFTP Process:
ListFTP[id=0179107b-f93a-12c7-8a5e-52491c478f60]
failed to process session due to null; Processor Administratively Yielded
for 1 sec: java.lang.NullPointerException

I have created a JIRA: https://issues.apache.org/jira/browse/NIFI-9837

kind regards

Jens M. Kofoed


Re: InvokeHTTP performance - max out at 50 post per sec.

2022-03-04 Thread Jens M. Kofoed
Hi Isha

Thanks for your reply.
I have tried to play with the scheduling, and it doesn't help much. So I
will use the concurrent task. Had hoped there where other things which
could be tweaked to get it to run faster.

kind regards
Jens M. Kofoed


Den fre. 4. mar. 2022 kl. 10.16 skrev Isha Lamboo <
isha.lam...@virtualsciences.nl>:

> Hi Jens,
>
>
>
> The behaviour you describe doesn’t seem abnormal if the round-trip time
> for a request is around 20ms in you test setup. A single InvokeHTTP would
> process requests one by one and that gives 1000 ms /20 ms = 50 requests per
> second. Increasing concurrent tasks to send requests in parallel is exactly
> the right solution here and you test results show as much.
>
>
>
> If the round trip time is much smaller than 20 ms, you should be able to
> tune the scheduling to get more throughput per concurrent task. I would
> test increasing the run duration to let the processor take on more tasks
> without yielding in between. This will impact the rest of your  flows if
> you set it too high though.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* Jens M. Kofoed 
> *Verzonden:* vrijdag 4 maart 2022 09:35
> *Aan:* users@nifi.apache.org
> *Onderwerp:* InvokeHTTP performance - max out at 50 post per sec.
>
>
>
> Hi
>
>
>
> I have created some small test with invokeHTTP, and it seems like that
> invokeHTTP only can handle 50 flowfiles per second. If I configure the
> invokeHTTP to have x Concurrent Tasks, the output increase with x times.
>
>  I created a very small flow
>
> GenerateFlowFile
>
>   File Size: 100kb
>
>   Batch size: 100
>
>   Run Schedule: 1 sec
>
> ->
>
> InvokeHTTP:
>
>   HTTP Method: POST
>
>
>
> The receiving host, is another nifi node
> with HandleHttpRequest/HandleHttpResponse. And it has no issue handling 4x
> invokeHTTP.
>
>
>
> So are there something in invokeHTTP which can be tweaked to perform more
> than 50 files per sec?
>
>
>
> Kind regards
>
> Jens M. Kofoed
>
>
>
>
>


InvokeHTTP performance - max out at 50 post per sec.

2022-03-04 Thread Jens M. Kofoed
Hi

I have created some small test with invokeHTTP, and it seems like that
invokeHTTP only can handle 50 flowfiles per second. If I configure the
invokeHTTP to have x Concurrent Tasks, the output increase with x times.
 I created a very small flow
GenerateFlowFile
  File Size: 100kb
  Batch size: 100
  Run Schedule: 1 sec
->
InvokeHTTP:
  HTTP Method: POST

The receiving host, is another nifi node
with HandleHttpRequest/HandleHttpResponse. And it has no issue handling 4x
invokeHTTP.

So are there something in invokeHTTP which can be tweaked to perform more
than 50 files per sec?

Kind regards
Jens M. Kofoed


Not able to view or download content from queue

2022-02-01 Thread Jens M. Kofoed
Hi

I have a flow where I receive files which has been packed by tar and zip.
After unpacking the files I set the mime.type back to the original which
is: application/octet-stream
If I try to vied the content or download the content from a queue or
provance I get this error message: "Error parsing media type ''

Kind regards
Jens M. Kofoed


Re: PutTCP in version 1.15.3 runs with no inputs

2022-02-01 Thread Jens M. Kofoed
Hi Chris

Thanks for your reply. My work around is in fact, that I set the scheduling
to 1 sec. And hopefully it is fixed and well tested to the next release. I
just can't understand why a bug like this should come in a new release.

Kind regards
Jens M. Kofoed

Den tir. 1. feb. 2022 kl. 12.24 skrev Chris Sampson <
chris.samp...@naimuri.com>:

> Jens,
>
> A quick look at the first Jira ticket you linked (NIFI-9564) mentions that
> the processors have the "@TriggerWhenEmpty" annotation specified, which
> means that they will continually trigger according to the configured
> Scheduling even when there are no incoming FlowFiles. Whatsmore is that the
> processors don't actually do anything when triggered with no input FlowFile
> - so it's a completely wasted use of resources (N.B. I'm not really
> familiar with these processors, just reading the Jira ticket description
> and taking a very quick look at the linked PR).
>
> They also don't currently include the "@SupportsBatching" annotation,
> which can drastically improve performance when processing multiple input
> FlowFiles in a lot of processors.
>
> This Jira ticket has been resolved for 1.16.0, so hopefully the next
> feature release of NiFi will include the improvements. Until then, you
> could reduce the Scheduling of the processor to at least have it run fewer
> tasks (e.g. process every 1 second instead of 0 seconds). This would reduce
> the number of tasks/time, but also mean that any incoming FlowFiles would
> have to wait up to 1 second before being taken from the queue for
> processing, so you have to decide whether it's more important to cut down
> on resource waste or have the reduced latency for incoming FlowFiles. I've
> faced similar considerations for things like GetSQS in the past - I either
> leave it at "0 secs" and let it run continually or change it to "1 sec" and
> accept that some messages landing on the SQS queue will take a little
> longer to be picked up by NiFi.
>
>
>
> ---
> *Chris Sampson*
> IT Consultant
> chris.samp...@naimuri.com
>
>
> On Tue, 1 Feb 2022 at 10:42, Jens M. Kofoed 
> wrote:
>
>> Hi
>>
>> I don't know if these JIRAs are handling the same error I got with NIFI
>> v. 1.15.3
>> https://issues.apache.org/jira/browse/NIFI-9546 PutTCP / PutUDP are
>> inefficient
>> https://issues.apache.org/jira/browse/NIFI-9571 PutTCP and PutUDP not
>> committing Session
>>
>> But the PutCP process runs tasks continuously without any incoming flow
>> files.
>> Here the process has only being running for a couple of seconds and
>> already have run 3,721,047 tasks.
>> [image: image.png]
>>
>> kind regards
>> Jens M. Kofoed
>>
>>


PutTCP in version 1.15.3 runs with no inputs

2022-02-01 Thread Jens M. Kofoed
Hi

I don't know if these JIRAs are handling the same error I got with NIFI v.
1.15.3
https://issues.apache.org/jira/browse/NIFI-9546 PutTCP / PutUDP are
inefficient
https://issues.apache.org/jira/browse/NIFI-9571 PutTCP and PutUDP not
committing Session

But the PutCP process runs tasks continuously without any incoming flow
files.
Here the process has only being running for a couple of seconds and already
have run 3,721,047 tasks.
[image: image.png]

kind regards
Jens M. Kofoed


Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Jens M. Kofoed
Hi Mark

All the files in my testflow are 1GB files. But it happens in my production 
flow with different file sizes. 

When these issues have happened, I have the flowfile routed to an 
updateAttribute process which is disabled. Just to keep the file in a queue. 
Enable the process and sent the file back to a new hash calculation, the file 
is OK. So I don’t think the test with backup and compare makes any sense to do. 

Regards 
Jens

> Den 3. nov. 2021 kl. 15.57 skrev Mark Payne :
> 
> So what I found interesting about the histogram output was that in each case, 
> the input file was 1 GB. The number of bytes that differed between the ‘good’ 
> and ‘bad’ hashes was something like 500-700 bytes whose values were 
> different. But the values ranged significantly. There was no indication that 
> the type of thing we’ve seen with NFS mounts was happening, where data was 
> nulled out until received and then updated. If that had been the case we’d 
> have seen the NUL byte (or some other value) have a very significant change 
> in the histogram, but we didn’t see that.
> 
> So a couple more ideas that I think can be useful.
> 
> 1) Which garbage collector are you using? It’s configured in the 
> bootstrap.conf file
> 
> 2) We can try to definitively prove out whether the content on the disk is 
> changing or if there’s an issue reading the content. To do this:
> 
> 1. Stop all processors.
> 2. Shutdown nifi
> 3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this 
> will delete all FlowFiles & content, so only do this on a dev/test system 
> where you’re comfortable deleting it!)
> 4. Start nifi
> 5. Let exactly 1 FlowFile into your flow.
> 6. While it is looping through, create a copy of your entire Content 
> Repository: cp -r content_repository content_backup1; zip content_backup1.zip 
> content_backup1
> 7. Wait for the hashes to differ
> 8. Create another copy of the Content Repository: cp -r content_repository 
> content_backup2
> 9. Find the files within the content_backup1 and content_backup2 and compare 
> them to see if they are identical. Would recommend comparing them using each 
> of the 3 methods: sha256, sha512, diff
> 
> This should make it pretty clear that either:
> (1) the issue resides in the software: either NiFi or the JVM
> (2) the issue resides outside of the software: the disk, the disk driver, the 
> operating system, the VM hypervisor, etc.
> 
> Thanks
> -Mark
> 
>> On Nov 3, 2021, at 10:44 AM, Joe Witt  wrote:
>> 
>> Jens,
>> 
>> 184 hours (7.6 days) in and zero issues.
>> 
>> Will need to turn this off soon but wanted to give a final update.
>> Looks great.  Given the information on your system there appears to be
>> something we dont understand related to the virtual file system
>> involved or something.
>> 
>> Thanks
>> 
>>> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed  
>>> wrote:
>>> 
>>> Hi Mark
>>> 
>>> Of course, sorry :-)  By looking at the error messages, I can see that it 
>>> is only the histograms which has differences which is listed. And all 3 
>>> have the first issue at histogram.9. Don't know what that mean
>>> 
>>> /Jens
>>> 
>>> Here are the error log:
>>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] 
>>> org.apache.nifi.processors.script.ExecuteScript 
>>> ExecuteScript[id=c7d3335b-1045-14ed--a0d62c70] There are 
>>> differences in the histogram
>>> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
>>> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
>>> Byte Value: histogram.101, Previous Count: 11925396, New Count: 11925407
>>> Byte Value: histogram.102, Previous Count: 11929923, New Count: 11929941
>>> Byte Value: histogram.103, Previous Count: 11931596, New Count: 11931591
>>> Byte Value: histogram.104, Previous Count: 11929071, New Count: 11929064
>>> Byte Value: histogram.105, Previous Count: 11931365, New Count: 11931348
>>> Byte Value: histogram.106, Previous Count: 11928661, New Count: 11928645
>>> Byte Value: histogram.107, Previous Count: 11929864, New Count: 11929866
>>> Byte Value: histogram.108, Previous Count: 11931611, New Count: 11931642
>>> Byte Value: histogram.109, Previous Count: 11932758, New Count: 11932763
>>> Byte Value: histogram.110, Previous Count: 11927893, New Count: 11927895
>>> Byte Value: histogram.111, Previous Count: 11933519, New Count: 11933522
>>> Byte Value: histogram.112, Previous Count: 11931392, New Count: 11931397
>>> Byte Value: his

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-01 Thread Jens M. Kofoed
5;11931023
histogram.106;11932342
histogram.107;11929778
histogram.108;11930098
histogram.109;11930759
histogram.11;0
histogram.110;11934343
histogram.111;11935775
histogram.112;11933877
histogram.113;11926675
histogram.114;11929332
histogram.115;11928876
histogram.116;11927819
histogram.117;11932657
histogram.118;11933508
histogram.119;11928808
histogram.12;0
histogram.120;11937532
histogram.121;11926907
histogram.122;11933942
histogram.123;0
histogram.124;0
histogram.125;0
histogram.126;0
histogram.127;0
histogram.128;0
histogram.129;0
histogram.13;0
histogram.130;0
histogram.131;0
histogram.132;0
histogram.133;0
histogram.134;0
histogram.135;0
histogram.136;0
histogram.137;0
histogram.138;0
histogram.139;0
histogram.14;0
histogram.140;0
histogram.141;0
histogram.142;0
histogram.143;0
histogram.144;0
histogram.145;0
histogram.146;0
histogram.147;0
histogram.148;0
histogram.149;0
histogram.15;0
histogram.150;0
histogram.151;0
histogram.152;0
histogram.153;0
histogram.154;0
histogram.155;0
histogram.156;0
histogram.157;0
histogram.158;0
histogram.159;0
histogram.16;0
histogram.160;0
histogram.161;0
histogram.162;0
histogram.163;0
histogram.164;0
histogram.165;0
histogram.166;0
histogram.167;0
histogram.168;0
histogram.169;0
histogram.17;0
histogram.170;0
histogram.171;0
histogram.172;0
histogram.173;0
histogram.174;0
histogram.175;0
histogram.176;0
histogram.177;0
histogram.178;0
histogram.179;0
histogram.18;0
histogram.180;0
histogram.181;0
histogram.182;0
histogram.183;0
histogram.184;0
histogram.185;0
histogram.186;0
histogram.187;0
histogram.188;0
histogram.189;0
histogram.19;0
histogram.190;0
histogram.191;0
histogram.192;0
histogram.193;0
histogram.194;0
histogram.195;0
histogram.196;0
histogram.197;0
histogram.198;0
histogram.199;0
histogram.2;0
histogram.20;0
histogram.200;0
histogram.201;0
histogram.202;0
histogram.203;0
histogram.204;0
histogram.205;0
histogram.206;0
histogram.207;0
histogram.208;0
histogram.209;0
histogram.21;0
histogram.210;0
histogram.211;0
histogram.212;0
histogram.213;0
histogram.214;0
histogram.215;0
histogram.216;0
histogram.217;0
histogram.218;0
histogram.219;0
histogram.22;0
histogram.220;0
histogram.221;0
histogram.222;0
histogram.223;0
histogram.224;0
histogram.225;0
histogram.226;0
histogram.227;0
histogram.228;0
histogram.229;0
histogram.23;0
histogram.230;0
histogram.231;0
histogram.232;0
histogram.233;0
histogram.234;0
histogram.235;0
histogram.236;0
histogram.237;0
histogram.238;0
histogram.239;0
histogram.24;0
histogram.240;0
histogram.241;0
histogram.242;0
histogram.243;0
histogram.244;0
histogram.245;0
histogram.246;0
histogram.247;0
histogram.248;0
histogram.249;0
histogram.25;0
histogram.250;0
histogram.251;0
histogram.252;0
histogram.253;0
histogram.254;0
histogram.255;0
histogram.26;0
histogram.27;0
histogram.28;0
histogram.29;0
histogram.3;0
histogram.30;0
histogram.31;0
histogram.32;11929486
histogram.33;11930737
histogram.34;11931092
histogram.35;11934488
histogram.36;11927605
histogram.37;11930735
histogram.38;11932174
histogram.39;11936180
histogram.4;0
histogram.40;11931666
histogram.41;11927043
histogram.42;11929044
histogram.43;11934104
histogram.44;11936337
histogram.45;11935580
histogram.46;11929598
histogram.47;11934083
histogram.48;11928858
histogram.49;11931098
histogram.5;0
histogram.50;11930618
histogram.51;11925429
histogram.52;11929741
histogram.53;11934160
histogram.54;11931999
histogram.55;11930465
histogram.56;11926194
histogram.57;11926386
histogram.58;11924871
histogram.59;11929331
histogram.6;0
histogram.60;11926951
histogram.61;11928631
histogram.62;11927549
histogram.63;23856730
histogram.64;11930288
histogram.65;11931523
histogram.66;11932821
histogram.67;11932509
histogram.68;11929613
histogram.69;11928651
histogram.7;0
histogram.70;11929253
histogram.71;11931521
histogram.72;11925805
histogram.73;11934833
histogram.74;11928314
histogram.75;11923854
histogram.76;11930892
histogram.77;11927528
histogram.78;11932850
histogram.79;11934471
histogram.8;0
histogram.80;11925707
histogram.81;11929213
histogram.82;11931334
histogram.83;11936739
histogram.84;11927855
histogram.85;11931668
histogram.86;11928609
histogram.87;11931930
histogram.88;11934341
histogram.89;11927519
histogram.9;11928004
histogram.90;11933502
histogram.91;0
histogram.92;0
histogram.93;0
histogram.94;11932024
histogram.95;11932693
histogram.96;0
histogram.97;11928428
histogram.98;11933195
histogram.99;11924273
histogram.totalBytes;1073741824

Kind regards
Jens

Den søn. 31. okt. 2021 kl. 21.40 skrev Joe Witt :

> Jen
>
> 118 hours in - still goood.
>
> Thanks
>
> On Fri, Oct 29, 2021 at 10:22 AM Joe Witt  wrote:
> >
> > Jens
> >
> > Update from hour 67.  Still lookin' good.
> >
> > Will advise.
> >
> > Thanks
> >
> > On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed 
> wrote:
> > >
> > > Many many thanks 🙏 Joe for looking into this. My test flow was
> runnin

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-28 Thread Jens M. Kofoed
Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 
days before the first error occurred

Thanks

> Den 28. okt. 2021 kl. 16.57 skrev Joe Witt :
> 
> Jens,
> 
> Am 40+ hours in running both your flow and mine to reproduce.  So far
> neither have shown any sign of trouble.  Will keep running for another
> week or so if I can.
> 
> Thanks
> 
>> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed  
>> wrote:
>> 
>> The Physical hosts with VMWare is using the vmfs but the vm machines running 
>> at hosts can’t see that.
>> But you asked about the underlying file system 😀 and since my first answer 
>> with the copy from the fstab file wasn’t enough I just wanted to give all 
>> the details 😁.
>> 
>> If you create a vm for windows you would probably use NTFS (on top of vmfs). 
>> For Linux EXT3, EXT4, BTRFS, XFS and so on.
>> 
>> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and 
>> sdd) for each Linux machine. I don’t use nfs
>> 
>> Kind regards
>> Jens
>> 
>> 
>> 
>> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt :
>> 
>> Jens,
>> 
>> I don't quite follow the EXT4 usage on top of VMFS but the point here
>> is you'll ultimately need to truly understand your underlying storage
>> system and what sorts of guarantees it is giving you.  If linux/the
>> jvm/nifi think it has a typical EXT4 type block storage system to work
>> with it can only be safe/operate within those constraints.  I have no
>> idea about what VMFS brings to the table or the settings for it.
>> 
>> The sync properties I shared previously might help force the issue of
>> ensuring a formal sync/flush cycle all the way through the disk has
>> occurred which we'd normally not do or need to do but again in some
>> cases offers a stronger guarantee in exchange for performance.
>> 
>> In any case...Mark's path for you here will help identify what we're
>> dealing with and we can go from there.
>> 
>> I am aware of significant usage of NiFi on VMWare configurations
>> without issue at high rates for many years so whatever it is here is
>> likely solvable.
>> 
>> Thanks
>> 
>> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed  
>> wrote:
>> 
>> 
>> Hi Mark
>> 
>> 
>> Thanks for the clarification. I will implement the script when I return to 
>> the office at Monday next week ( November 1st).
>> 
>> I don’t use NFS, but ext4. But I will implement the script so we can check 
>> if it’s the case here. But I think the issue might be after the processors 
>> writing content to the repository.
>> 
>> I have a test flow running for more than 2 weeks without any errors. But 
>> this flow only calculate hash and comparing.
>> 
>> 
>> Two other flows both create errors. One flow use 
>> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use 
>> MergeContent->UnpackContent->CryptographicHashContent->compares. The last 
>> flow is totally inside nifi, excluding other network/server issues.
>> 
>> 
>> In both cases the CryptographicHashContent is right after a process which 
>> writes new content to the repository. But in one case a file in our 
>> production flow did calculate a wrong hash 4 times with a 1 minutes delay 
>> between each calculation. A few hours later I looped the file back and this 
>> time it was OK.
>> 
>> Just like the case in step 5 and 12 in the pdf file
>> 
>> 
>> I will let you all know more later next week
>> 
>> 
>> Kind regards
>> 
>> Jens
>> 
>> 
>> 
>> 
>> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne :
>> 
>> 
>> And the actual script:
>> 
>> 
>> 
>> import org.apache.nifi.flowfile.FlowFile
>> 
>> 
>> import java.util.stream.Collectors
>> 
>> 
>> Map getPreviousHistogram(final FlowFile flowFile) {
>> 
>>   final Map histogram = 
>> flowFile.getAttributes().entrySet().stream()
>> 
>>   .filter({ entry -> entry.getKey().startsWith("histogram.") })
>> 
>>   .collect(Collectors.toMap({ entry -> entry.key}, { entry -> 
>> entry.value }))
>> 
>>   return histogram;
>> 
>> }
>> 
>> 
>> Map createHistogram(final FlowFile flowFile, final 
>> InputStream inStream) {
>> 
>>   final Map histogram = new HashMap<>();
>> 
>>   final int[] distribution = new int[2

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Jens M. Kofoed
bytes. Once the NFS server receives those 
>> bytes, it then goes back and fills in the proper bytes. So if you’re running 
>> on NFS, it is possible for the contents of the file on the underlying file 
>> system to change out from under you. It’s not clear to me what other types 
>> of file system might do something similar.
>> 
>> So, one thing that we can do is to find out whether or not the contents of 
>> the underlying file have changed in some way, or if there’s something else 
>> happening that could perhaps result in the hashes being wrong. I’ve put 
>> together a script that should help diagnose this.
>> 
>> Can you insert an ExecuteScript processor either just before or just after 
>> your CryptographicHashContent processor? Doesn’t really matter whether it’s 
>> run just before or just after. I’ll attach the script here. It’s a Groovy 
>> Script so you should be able to use ExecuteScript with Script Engine = 
>> Groovy and the following script as the Script Body. No other changes needed.
>> 
>> The way the script works, it reads in the contents of the FlowFile, and then 
>> it builds up a histogram of all byte values (0-255) that it sees in the 
>> contents, and then adds that as attributes. So it adds attributes such as:
>> histogram.0 = 280273
>> histogram.1 = 2820
>> histogram.2 = 48202
>> histogram.3 = 3820
>> …
>> histogram.totalBytes = 1780928732
>> 
>> It then checks if those attributes have already been added. If so, after 
>> calculating that histogram, it checks against the previous values (in the 
>> attributes). If they are the same, the FlowFile goes to ’success’. If they 
>> are different, it logs an error indicating the before/after value for any 
>> byte whose distribution was different, and it routes to failure.
>> 
>> So, if for example, the first time through it sees 280,273 bytes with a 
>> value of ‘0’, and the second times it only sees 12,001 then we know there 
>> were a bunch of 0’s previously that were updated to be some other value. And 
>> it includes the total number of bytes in case somehow we find that we’re 
>> reading too many bytes or not enough bytes or something like that. This 
>> should help narrow down what’s happening.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Oct 26, 2021, at 6:25 PM, Joe Witt  wrote:
>>> 
>>> Jens
>>> 
>>> Attached is the flow I was using (now running yours and this one).  Curious 
>>> if that one reproduces the issue for you as well.
>>> 
>>> Thanks
>>> 
>>>> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt  wrote:
>>>> Jens
>>>> 
>>>> I have your flow running and will keep it running for several days/week to 
>>>> see if I can reproduce.  Also of note please use your same test flow but 
>>>> use HashContent instead of crypto hash.  Curious if that matters for any 
>>>> reason...
>>>> 
>>>> Still want to know more about your underlying storage system.
>>>> 
>>>> You could also try updating nifi.properties and changing the following 
>>>> lines:
>>>> nifi.flowfile.repository.always.sync=true
>>>> nifi.content.repository.always.sync=true
>>>> nifi.provenance.repository.always.sync=true
>>>> 
>>>> It will hurt performance but can be useful/necessary on certain storage 
>>>> subsystems.
>>>> 
>>>> Thanks
>>>> 
>>>>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt  wrote:
>>>>> Ignore "For the scenario where you can replicate this please share the 
>>>>> flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>>> 
>>>>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt  wrote:
>>>>>> Jens,
>>>>>> 
>>>>>> We asked about the underlying storage system.  You replied with some 
>>>>>> info but not the specifics.  Do you know precisely what the underlying 
>>>>>> storage is and how it is presented to the operating system?  For 
>>>>>> instance is it NFS or something similar?
>>>>>> 
>>>>>> I've setup a very similar flow at extremely high rates running for the 
>>>>>> past several days with no issue.  In my case though I know precisely 
>>>>>> what the config is and the disk setup is.  Didn't do anything special to 
>>>>>> be clear but still it is important to know.
>>&g

Re: Nifi secured cluster can't send heartbeat between nodes

2021-10-27 Thread Jens M. Kofoed
Hi

I don’t know if this will help. But the certificate used by nifi needs be both 
a server auth and client auth. Normally certificates are only one of them. The 
nifi certificate use the server auth for the web ui and when other servers 
connect to it. It is using the client auth when nifi talks to other  nifi 
servers.
Regards 
Jens

> Den 27. okt. 2021 kl. 15.27 skrev QUEVILLON EMMANUEL - EXT-SAFRAN ENGINEERING 
> SERVICES (SAFRAN) :
> 
> Hi list,
>  
> I’m facing a weird problem I can’t resolve or even understand with my secured 
> nifi cluster. Below is the situation.
> We have setup a secured nifi cluster with 3 nodes, say node1,node2 and node3.
> For each of theses nodes, we’ve manually created a SSL certificate signing 
> request (CSR) (using a password protected private key) to be signed by our 
> internal CA.
> Once we’ve get the certificates signed, I’ve installed each node certificates 
> following this procedure:
>  
> 1)  Add the full certificate chain (root + intermediate certificates) 
> into the signed certificate.
> cat nifi-nodeX.pem cert_chain.pem  > full-nifi-nodeX.pem
> 2)  Create a PKCS12 certificate using private key (.key) and full signed 
> certificate (.pem)
> openssl pkcs12 -export  -in full-nifi-nodeX.pem  -inkey nifi-nodeX.key  -out 
> nifi-nodeX.p12 \
> -name nifi-nodeX -passin pass:"XX" -passout 
> pass:Y;
> 3)  Import nifi-nodeX.p12 into the nifi-nodeX keystore
> keytool –omportkeystore –deststorepass xx –destkeystore keystore.jks 
> –srckeystore nifi-nodeX.p12 –srcstoretype PKCS12
> 4)  Then added each other nifi-node certificates (.pem) into 
> nifi-truststore
> node1: add full-nifi-node2 + full-nifi-node3 into truststore
> node2: add full-nifi-node1 + full-nifi-node3 into truststore
> node3: add full-nifi-node2 + full-nifi-node1 into truststore
> 5)  Restarted each node
>  
> Once each node are restarted, I can connect to the web UI, but I’ve got an 
> error message saying:
>  
> For info, web UI is reachable on port 8443
>  
> Invalid State:
> The Flow Controller is initializing the Data Flow.
>  
> Looking at node logs (nifi-app.log) I can see that each node cannot talk to 
> each other and to the Coordinator to send heartbeat messages:
>  
> Nifi-node1:
>  
> INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that 
> Cluster Coordinator is located at nifi-node2:11443; will use this address for 
> sending heartbeat messages
> INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is 
> located at nifi-node2:11443. Will send Cluster Connection Request to this 
> address
> WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to 
> cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed 
> unmarshalling 'CONNECTION_RESPONSE' protocol message from 
> nifi-node2.rd1.rf1/10.108.70.39:11443 due to: 
> javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
> …
>  
> Nifi-node2:
>  
> WARN [Process Cluster Protocol Request-1] 
> o.a.n.c.p.impl.SocketProtocolListener Failed processing protocol message from 
> nifi-node3 due to Empty client certificate chain
> INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that 
> Cluster Coordinator is located at nifi-node2:11443; will use this address for 
> sending heartbeat messages
> INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is 
> located at nifi-node2:11443. Will send Cluster Connection Request to this 
> address
> WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to 
> cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed 
> marshalling 'CONNECTION_REQUEST' protocol message due to: 
> java.net.SocketException: Connection reset by peer (Write failed)
> …
>  
> Nifi-node3:
>  
> INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that 
> Cluster Coordinator is located at nifi-node2:11443; will use this address for 
> sending heartbeat messages
> INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is 
> located at nifi-node2:11443. Will send Cluster Connection Request to this 
> address
> WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to 
> cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed 
> unmarshalling 'CONNECTION_RESPONSE' protocol message from 
> nifi-node2/10.108.70.39:11443 due to: javax.net.ssl.SSLHandshakeException: 
> Received fatal alert: bad_certificate
>  
> It looks like the signed certificates are not ok regarding the logs errors.
> However, trying these certificates using openssl s_client command works as 
> expected:
>  
> openssl s_client -connect nifi-node3:11443 -cert full-nifi-node3.pem -key 
> nifi-node3.key -pass pass:'XXX’
> CONNECTED(0003)
> depth=3 C = FR, O = SAFRAN, OU = 0002 562082909, CN = SAFRAN Root CA 1
> verify return:1
> depth=2 C = FR, O = SAFRAN, OU = 0002 562082909, CN = SAFRAN Corporate CA 1

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Jens M. Kofoed
Dear Mark and Joe

I know my setup isn’t normal for many people. But if we only looks at my 
receive side, which the last mails is about. Every thing is happening at the 
same NIFI instance. It is the same 3 node NIFI cluster.
After fetching a FlowFile-stream file and unpacked it back into NiFi I 
calculate a sha256. 1 minutes later I recalculate the sha256 on the exact same 
file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two different 
hashes, is very strange, but it happens. Over the last 5 months it have only 
happen 35-40 times.

I can understand if the file is not completely loaded and saved into the 
content repository before the hashing starts. But I believe that the unpack 
process don’t forward the flow file to the next process before it is 100% 
finish unpacking and saving the new content to the repository.

I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 files 
per node) and next process was a hashcontent before it run into a test loop. 
Where files are uploaded via PutSFTP to a test server, and downloaded again and 
recalculated the hash. I have had one issue after 3 days of running.
Now the test flow is running without the Put/Fetch sftp processors.

Another problem is that I can’t find any correlation to other events. Not 
within NIFI, nor the server itself or VMWare. If I just could find any other 
event which happens at the same time, I might be able to force some kind of 
event to trigger the issue.
I have tried to force VMware to migrate a NiFi node to another host. Forcing it 
to do a snapshot and deleting snapshots, but nothing can trigger and error.

I know it will be very very difficult to reproduce. But I will setup multiple 
NiFi instances running different test flows to see if I can find any reason why 
it behaves as it does.

Kind Regards
Jens M. Kofoed

> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> 
> Jens,
> 
> Thanks for sharing the images.
> 
> I tried to setup a test to reproduce the issue. I’ve had it running for quite 
> some time. Running through millions of iterations.
> 
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> iterations.
> 
> So far I cannot replicate. And since you’re pulling the data via SFTP and 
> then unpacking, which preserves all original attributes from a different 
> system, this can easily become confusing.
> 
> Recommend trying to reproduce with SFTP-related processors out of the 
> picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> an ‘initial hash’, copy that value to another attribute, and then loop, 
> generating the hash and comparing against the original one. I’ll attach a 
> flow that does this, but not sure if the email server will strip out the 
> attachment or not.
> 
> This way we remove any possibility of actual corruption between the two nifi 
> instances. If we can still see corruption / different hashes within a single 
> nifi instance, then it certainly warrants further investigation but i can’t 
> see any issues so far.
> 
> Thanks
> -Mark
> 
> 
> 
> 
> 
>> On Oct 20, 2021, at 10:21 AM, Joe Witt  wrote:
>> 
>> Jens
>> 
>> Actually is this current loop test contained within a single nifi and there 
>> you see corruption happen?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt  wrote:
>> Jens,
>> 
>> You have a very involved setup including other systems (non NiFi).  Have you 
>> removed those systems from the equation so you have more evidence to support 
>> your expectation that NiFi is doing something other than you expect?
>> 
>> Joe
>> 
>> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed  
>> wrote:
>> Hi
>> 
>> Today I have another file which have been running through the retry loop one 
>> time. To test the processors and the algorithm I added the HashContent 
>> processor and also added hashing by SHA-1.
>> I file have been going through the system, and both the SHA-1 and SHA-256 
>> are both different than expected. with a 1 minutes delay the file is going 
>> back into the hashing content flow and this time it calculates both hashes 
>> fine.
>> 
>> I don't believe that the hashing is buggy, but something is very very 
>> strange. What can influence the processors/algorithm to calculate a 
>> different hash???
>> All the input/output claim information is exactly the same. It is the same 
>> flow/content file going in a loop. It happens on all 3 nodes.
>> 
>> Any suggestions for where to di

Re: Nifi and Registry behind Citrix ADC.

2021-10-18 Thread Jens M. Kofoed
Only if you want other ways to authenticate users. I have setup our NIFI
systems to talk with our MS AD via ldaps, and defined different AD groups
which in nifi has different policy rules. Some people can manage every
thing, others can only start/stop specific processors in specific process
groups.
Using personal certificates is no problem, I have some admins which also
use there personal certificates. But with certificates you would have to
add and manage users manually in NIFI. Users can of course being added to
internal groups in NIFI and policy configured to groups.

reagrd
Jens

Den tir. 19. okt. 2021 kl. 07.43 skrev Jakobsson Stefan <
stefan.jakobs...@scania.com>:

> We are currently authenticating with personal certificates, should we
> change that then?
>
>
>
> *Stefan Jakobsson*
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> *From:* Shawn Weeks 
> *Sent:* den 18 oktober 2021 21:35
> *To:* users@nifi.apache.org
> *Subject:* RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Unless you’re operating the LB in TCP Mode you’ll need to configure NiFi
> to use an alternative authentication method like SAML, LDAP, OIDC, etc.
> You’ll also need to make sure that your proxy is passing the various HTTP
> headers through to NiFi and that NiFi is expecting traffic from a proxy. If
> you look in the nifi-user.log and nifi-app.log there might be some hints
> about what it didn’t like.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From:* Jakobsson Stefan 
> *Sent:* Monday, October 18, 2021 2:26 PM
> *To:* users@nifi.apache.org
> *Subject:* RE: Nifi and Registry behind Citrix ADC.
>
>
>
> Ahh, no ADC as in applicationdelivery and loadbalancing 😊
>
>
>
> *Stefan Jakobsson*
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
> *From:* Lehel Boér 
> *Sent:* den 18 oktober 2021 15:03
> *To:* users@nifi.apache.org
> *Subject:* Re: Nifi and Registry behind Citrix ADC.
>
>
>
> Hi Stefan,
>
>
>
> Please disregard my prior response. The name mislead me, I discovered ADC
> is not the same as Active Directory.
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Lehel Boér  ezt írta (időpont: 2021. okt. 18., H,
> 14:54):
>
> Hi Stefan,
>
>
>
> Have you tried setting up NiFi with an LDAP provider? Here are a few
> useful links.
>
>
>
> -
> https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/content/ldap_login_identity_provider.html
>
> - https://pierrevillard.com/2017/01/24/integration-of-nifi-with-ldap
>
>
>
> Kind Regards,
>
> Lehel Boér
>
>
>
> Jakobsson Stefan  ezt írta (időpont: 2021.
> okt. 18., H, 13:02):
>
> Hello,
>
>
>
> I have some issues trying to run Nifi and Nifi-registry behind an ADC.
> Reason for this is that we need Nifi be accessible from aws onto our onprem
> nifi installation due demands from our IT sec department
>
>
>
> Anyhow, I can connect to Nifi-Registry on the servers ipconfig (i.e.
> x.x.x.x:9443/nifi-registry) without problems, but if I try to use the URL
> setup in the ADC with 9443 redirected to the nifiservers IP we get an error
> saying:
>
>
>
> This page isn’t working
>
> *nifiprod.oururl.com * didn’t send any data.
>
> ERR_EMPTY_RESPONSE
>
>
>
> Anyone has any ideas what I should start looking at? I set the https.host
> to 0.0.0.0 in nifi-registry.conf.
>
>
>
> *Stefan Jakobsson*
>
>
> Systems Manager  |  Scania IT, IKCA |  Scania CV AB
>
> Phone: +46 8 553 527 27 Mobile: +46 7 008 834 76
>
> Forskargatan 20, SE-151 87 Södertälje, Sweden
>
> stefan.jakobs...@scania.com
>
>
>
>


Re: ${hostname(true)} returns localhost

2021-09-09 Thread Jens M. Kofoed
Thanks, I will check for reboot 

Regards 
Jens

> Den 9. sep. 2021 kl. 15.27 skrev Shawn Weeks :
> 
> Adding the hostname to /etc/hosts with 127.0.0.1 as the IP is common in 
> cloud-init VM images so double check that it’s not getting rewritten on 
> reboot if that applies to you.
>  
> Thanks
> Shawn
>  
> From: Jens M. Kofoed  
> Sent: Thursday, September 9, 2021 8:25 AM
> To: users@nifi.apache.org
> Subject: Re: ${hostname(true)} returns localhost
>  
> Thanks Shawn
>  
> Some how the hostname was added to the /etc/hosts file as 127.0.0.1
> It is now deleted and it works fine. 
>  
> Kind regards 
> Jens 
> 
> Den 9. sep. 2021 kl. 13.41 skrev Shawn Weeks :
> 
> Do you have nifi.cluster.node.address set to the hostname? I think that will 
> bypass the issue. The actual issue has to do with how the hostname is auto 
> detected, I think. Some things to check are that you have an appropriate 
> entry in /etc/hosts for your hostname pointing back to it’s IP and that you 
> don’t have an entry for the hostname pointing at 127.0.0.1.
>  
> Thanks
> Shawn
>  
> From: Jens M. Kofoed  
> Sent: Thursday, September 9, 2021 5:53 AM
> To: users@nifi.apache.org
> Subject: ${hostname(true)} returns localhost
>  
> Hi
>  
> I'm using nifi 1.14.0 and have a single node cluster and a 3 node cluster. At 
> the single node cluster the ${hostname(true)} returns the FQDN but at my 3 
> node cluster it returns "localhost".
>  
> Any ideas why???
>  
> Kind regards
> Jens M. Kofoed


Re: ${hostname(true)} returns localhost

2021-09-09 Thread Jens M. Kofoed
Thanks Shawn

Some how the hostname was added to the /etc/hosts file as 127.0.0.1
It is now deleted and it works fine. 

Kind regards 
Jens 

> Den 9. sep. 2021 kl. 13.41 skrev Shawn Weeks :
> 
> Do you have nifi.cluster.node.address set to the hostname? I think that will 
> bypass the issue. The actual issue has to do with how the hostname is auto 
> detected, I think. Some things to check are that you have an appropriate 
> entry in /etc/hosts for your hostname pointing back to it’s IP and that you 
> don’t have an entry for the hostname pointing at 127.0.0.1.
>  
> Thanks
> Shawn
>  
> From: Jens M. Kofoed  
> Sent: Thursday, September 9, 2021 5:53 AM
> To: users@nifi.apache.org
> Subject: ${hostname(true)} returns localhost
>  
> Hi
>  
> I'm using nifi 1.14.0 and have a single node cluster and a 3 node cluster. At 
> the single node cluster the ${hostname(true)} returns the FQDN but at my 3 
> node cluster it returns "localhost".
>  
> Any ideas why???
>  
> Kind regards
> Jens M. Kofoed


${hostname(true)} returns localhost

2021-09-09 Thread Jens M. Kofoed
Hi

I'm using nifi 1.14.0 and have a single node cluster and a 3 node cluster.
At the single node cluster the ${hostname(true)} returns the FQDN but at my
3 node cluster it returns "localhost".

Any ideas why???

Kind regards
Jens M. Kofoed


How can HAProxy detect if a node is disconneted

2021-09-09 Thread Jens M. Kofoed
Hi

I'm setting up a HAProxy (HAP) in front of our NIFI cluster. Up to now we
have just used DNS round robin, and no load balancing. Therefore we are
looking at HAP.
I can see that HAP can check if each node is available by http response 2xx
and 3xx codes. But if I'm disconnecting a node the web UI is still
available and http listener is also active.

Is it possible via EL to detect if nodes cluster status? If so, I could
create a PG to listen for health check and response a status code depending
on the nodes cluster status.

Kind regards
Jens M. Kofoed


Re: No Load Balancing since 1.13.2

2021-08-22 Thread Jens M. Kofoed
Hi Mark

Just back at the office after a small holiday :-)
I have tested my setup with nifi 1.14.0 regarding hostname and FQDN.
If I run a nslookup node01.domain.lan I get the address 192.168.1.11
If I configure nifi.cluster.load.balance.host=node01.domain.lan, netstat -l
show the following:
tcp0  0 localhost:6342 0.0.0.0:*   LISTEN

if I configure nifi.cluster.load.balance.host=192.168.1.11, netstat -l show
the  following:
tcp0  0 node01.domain.lan:6342 0.0.0.0:*   LISTEN

I don't know why it will be different than yours since I can get the
correct IP via nslookup

Kind regards
Jens M. Kofoed

Den fre. 6. aug. 2021 kl. 15.48 skrev Mark Payne :

> Jens,
>
> You’re right - my mistake, the change from
> “nifi.cluster.load.balance.address” to “nifi.cluster.load.balance.host” was
> in 1.14.0, not early on. In 1.14.0, only nifi.cluster.load.balance.host is
> used. The documentation and properties file both used .host, but the code
> was making use of .address instead. So the code was fixed in 1.14.0 to
> match what the documentation and nifi.properties file specified.
>
> I just did some testing locally on my macbook regarding the IP address vs.
> hostname.
> What I found is that if I use the IP address, it listens as expected.
> If I use just  (not fully qualified), interestingly it listens
> on localhost only.
> If I run: "nslookup " I get back .lan as the fqdn
> If I use ".lan” in my properties, it listens as expected.
>
> Thanks
> -Mark
>
> On Aug 6, 2021, at 12:28 AM, Jens M. Kofoed 
> wrote:
>
> Hi Mark
>
> In version 1.13.2 (at least) the file
> "main/nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java"
> is looking for a property called "nifi.cluster.load.balance.address" which
> has been reported in https://issues.apache.org/jira/browse/NIFI-8643 and
> fixed in version 1.14.0
>
> In version 1.14.0 the only way I can get it to work, is if I type in the
> IP address. If I don't specified it or type in the fqdn the load balance
> port will bind to localhost. which has been reported in
> https://issues.apache.org/jira/browse/NIFI-9010
> The result from running netstat -l
> tcp 0 0 localhost:6342 0.0.0.0:* LISTEN
>
> Kind regards
> Jens M. Kofoed
>
>
>
> Den tor. 5. aug. 2021 kl. 23.08 skrev Mark Payne :
>
>> Axel,
>>
>> I think that I can help clarify some of these things.
>>
>> First of all: nifi.cluster.load.balance.host vs.
>> nifi.cluster.load.balance.address
>> * The nifi.cluster.load.balance.host property is what matters.
>>
>> * The nifi.cluster.load.balance.address is not a real property. NiFi has
>> never looked at this property. However, in the first release that included
>> load-balancing, there was a typo in which the nifi.properties file had
>> “…address” instead of “…host”. This was later addressed.
>>
>> * So if you have a value for “nifi.cluster.load.balance.address”, it does
>> nothing and is always ignored.
>>
>>
>>
>> Next: nifi.cluster.load.balance.host property
>>
>> * nifi.cluster.load.balance.host can be either an IP address or a
>> hostname. But if set, other nodes in the cluster MUST be able to
>> communicate with the node using whatever value you put here. So using a
>> value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming
>> connections ONLY on that hostname. So if you set it to “localhost”, for
>> instance, no other node can connect to it, because no other host can
>> connect to the node using “localhost”. So this needs to be an address that
>> both the NiFi instance knows about/can bind to, and other nodes in the
>> cluster can connect to.
>>
>> * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for
>> incoming requests on all network interfaces / hostnames. It will advertise
>> its hostname to other nodes in the cluster according to whatever is set for
>> the “nifi.cluster.node.address” property. Meaning that other nodes in the
>> cluster must be able to connect to this node using whatever hostname is set
>> for the “nifi.cluster.node.address” property. If
>> the “nifi.cluster.node.address” property is not set, it advertises its
>> hostname as localhost - which means other nodes won’t be able to send to
>> it.
>>
>> So you must specify either the “nifi.cluster.load.balance.host” property
>> or the “nifi.cluster.node.address” property.
>>
>>
>>
>> Finally: having to delete the state directory
>>
>> If you change the “nifi.cluster.load.balance.host” or
>> “nifi.cluster.load.balance.port” 

Re: Deadlock in loop

2021-08-20 Thread Jens M. Kofoed
I don’t know which version of nifi you are using. But in the newer versions 
(1.13->) there are the possibility to only allow one incoming flow file to a 
process group. You could create a PG for unzipping where you can have a loop 
back if there are zip files inside the parent zip. As long there are files in 
the PG no new files will enter. 

Regards 
Jens M. Kofoed 

> Den 20. aug. 2021 kl. 09.49 skrev Aurélien Mazoyer :
> 
> Hi Shawn,
> 
> Thanks for your answer: that sounds good. However I did not manage to remove 
> the backpressure on queues in Nifi... Is it possible to do it or should I put 
> a super high limit so it would be as if there would be no limit?
> 
> Thanks!
> 
> Aurelien
> 
>> Le jeu. 19 août 2021 à 18:10, Shawn Weeks  a 
>> écrit :
>> I’ve had to deal with this exact problem. You must ensure that inside the 
>> loop is set to no back pressure. You can set back pressure on the entrance 
>> and final exit from the loop to control how much is in the loop but not in 
>> the middle.
>> 
>>  
>> 
>> Thanks
>> 
>> Shawn
>> 
>>  
>> 
>> From: Aurélien Mazoyer  
>> Sent: Thursday, August 19, 2021 10:55 AM
>> To: users@nifi.apache.org
>> Subject: Re: Deadlock in loop
>> 
>>  
>> 
>> Hello,
>> 
>>  
>> 
>> Let me give a bit more info. Here is a screen of my blocked flow :
>> 
>> 
>> 
>> The queues in the loop are all reaching their back pressure limit (set to 
>> 0.2 on my test cluster to reproduce more easily the behavior of my prod 
>> cluster). As a consequence, it seems that all processors that are part of 
>> the loop are stopped.
>> 
>> 
>> Any thoughts about this problem?
>> 
>> 
>> Thank you
>> 
>>  
>> 
>> Aurelien
>> 
>>  
>> 
>>  
>> 
>> Le mer. 18 août 2021 à 16:18, Aurélien Mazoyer  a 
>> écrit :
>> 
>> Hello Joe,
>> 
>>  
>> 
>> Thank you for your email. Sure, please find attached a template that 
>> contains the loop itself.
>> 
>>  
>> 
>> Best,
>> 
>>  
>> 
>> Aurelien
>> 
>>  
>> 
>> Le mer. 18 août 2021 à 14:31, Joe Witt  a écrit :
>> 
>> Hello
>> 
>>  
>> 
>> This case should work very well.  Please share the details of the flow 
>> configuration.  Can you download a flow template and share that?
>> 
>>  
>> 
>> thanks
>> 
>>  
>> 
>> On Wed, Aug 18, 2021 at 8:20 AM Aurélien Mazoyer  
>> wrote:
>> 
>> Hi,
>> 
>>  
>> 
>> I have a nifi flow that reads zip files. For each non-zip file it performs 
>> some treatment on its content and for each zip file it unzips it and 
>> performs the treatment on files in the archive. There is a loop in the flow 
>> so if a zip contains a zip, this zip will be reinjected at the beginning of 
>> the flow to be processed (and so one). However, when I have several zips in 
>> an archive, I experience a deadlock in my loop. Is there a solution to 
>> mitigate this issue in Nifi, such as having a back pressure on the first 
>> processor of the loop depending on the state on the queues in loop? 
>> 
>>  
>> 
>> Thank you,
>> 
>>  
>> 
>> Aurelien


Re: No Load Balancing since 1.13.2

2021-08-05 Thread Jens M. Kofoed
Hi Mark

In version 1.13.2 (at least) the file
"main/nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java"
is looking for a property called "nifi.cluster.load.balance.address" which
has been reported in https://issues.apache.org/jira/browse/NIFI-8643 and
fixed in version 1.14.0

In version 1.14.0 the only way I can get it to work, is if I type in the IP
address. If I don't specified it or type in the fqdn the load balance port
will bind to localhost. which has been reported in
https://issues.apache.org/jira/browse/NIFI-9010
The result from running netstat -l
tcp 0 0 localhost:6342 0.0.0.0:* LISTEN

Kind regards
Jens M. Kofoed



Den tor. 5. aug. 2021 kl. 23.08 skrev Mark Payne :

> Axel,
>
> I think that I can help clarify some of these things.
>
> First of all: nifi.cluster.load.balance.host vs.
> nifi.cluster.load.balance.address
> * The nifi.cluster.load.balance.host property is what matters.
>
> * The nifi.cluster.load.balance.address is not a real property. NiFi has
> never looked at this property. However, in the first release that included
> load-balancing, there was a typo in which the nifi.properties file had
> “…address” instead of “…host”. This was later addressed.
>
> * So if you have a value for “nifi.cluster.load.balance.address”, it does
> nothing and is always ignored.
>
>
>
> Next: nifi.cluster.load.balance.host property
>
> * nifi.cluster.load.balance.host can be either an IP address or a
> hostname. But if set, other nodes in the cluster MUST be able to
> communicate with the node using whatever value you put here. So using a
> value of 0.0.0.0 will not work. Also, if set, NiFi will listen for incoming
> connections ONLY on that hostname. So if you set it to “localhost”, for
> instance, no other node can connect to it, because no other host can
> connect to the node using “localhost”. So this needs to be an address that
> both the NiFi instance knows about/can bind to, and other nodes in the
> cluster can connect to.
>
> * If nifi.cluster.load.balance.host is NOT set: NiFi will listen for
> incoming requests on all network interfaces / hostnames. It will advertise
> its hostname to other nodes in the cluster according to whatever is set for
> the “nifi.cluster.node.address” property. Meaning that other nodes in the
> cluster must be able to connect to this node using whatever hostname is set
> for the “nifi.cluster.node.address” property. If
> the “nifi.cluster.node.address” property is not set, it advertises its
> hostname as localhost - which means other nodes won’t be able to send to
> it.
>
> So you must specify either the “nifi.cluster.load.balance.host” property
> or the “nifi.cluster.node.address” property.
>
>
>
> Finally: having to delete the state directory
>
> If you change the “nifi.cluster.load.balance.host” or
> “nifi.cluster.load.balance.port” property and restart a node, you must
> restart all nodes in the cluster. Otherwise, the other nodes won’t be able
> to send to that node.
> So, for example, when you changed the load.balance.host from fqdn or
> 0.0.0.0 to the IP address - the other nodes in the cluster would stop
> sending. I created a JIRA [1] for that. In my testing, when I changed the
> hostname, the other nodes stopped sending. But restarting them got things
> back on track. I wasn’t able to replicate the issue after restarting all
> nodes.
>
> Hope this is helpful!
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-9017
>
>
> On Aug 3, 2021, at 3:08 AM, Axel Schwarz  wrote:
>
> Hey guys,
>
> I think I found the "trick" for at least version 1.13.2 and of course I'll
> share it with you.
> I now use the following load balancing properties:
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.host=192.168.1.10
> nifi.cluster.load.balance.port=6342
> nifi.cluster.load.balance.connections.per.node=4
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> So I use the hosts IP address for balance.host instead of 0.0.0.0 or the
> fqdn and have no balance.address property at all.
> This led to partly load balancing in my case as already mentioned. It
> looked like I needed to do one more step to reach the goal and this step
> seems to be deleting all statemanagement files.
>
> Through the state-management.xml config file I changed the state
> management directory to be outside of the nifi installation, because the
> config file says "it is important, that the directory be copied over to the
> new version when upgrading nifi". So everytime when I upgraded or
> reinstalled Nifi during my load balancing odyssey, the statemanagement
> remained completely

Can't connect to Redis Sentinels after upgrading to v.1.14.0

2021-08-05 Thread Jens M. Kofoed
igured.

Please any help :-)

kind regards
Jens M. Kofoed


Re: Re: Re: No Load Balancing since 1.13.2

2021-08-04 Thread Jens M. Kofoed
Hi Axel

In version 1.13.2 I had to change the nifi.cluster.load.balance.host to
nifi.cluster.load.balance.address and I used the IP address to get it to
work.

In version 1.14.0 they have fixed the bug so now is actually use the
nifi.cluster.load.balance.host property so I have to change back again in
my property. But It still doesn't work if I use the hostname. I still have
to use the IP address.

I hope it is working for you.
Kind Regards
Jens M. Kofoed

Den tor. 29. jul. 2021 kl. 11.08 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Hmm... I can't remember :-( sorry
>
> My configuration for version 1.13.2 is like this:
> # cluster node properties (only configure for cluster nodes) #
> nifi.cluster.is.node=true
> nifi.cluster.node.address=nifi-node01.domaine.com
> nifi.cluster.node.protocol.port=9443
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.protocol.max.threads=50
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.node.max.concurrent.requests=100
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=5 mins
> nifi.cluster.flow.election.max.candidates=3
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.address=192.168.1.11
> nifi.cluster.load.balance.port=6111
> nifi.cluster.load.balance.connections.per.node=4
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> So I defined "nifi.cluster.node.address" with the hostname and not an ip
> adress and the "nifi.cluster.load.balance.address" with the ip address of
> the server.
> And triple check the configuration at all servers :-)
>
> Kind Regards
> Jens M. Kofoed
>
>
> Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz  >:
>
>> Hey Jens,
>>
>> in Issue Nifi-8643 you wrote the last comment with the exactly same
>> behaviour as we're experiencing now. 2 of 3 nodes were load balancing.
>> How did you get the third node to participate in load balancing? An
>> update to 1.14.0 does not change anything for us.
>>
>>
>> https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418
>>
>>
>> --- Ursprüngliche Nachricht ---
>> Von: "Jens M. Kofoed" 
>> Datum: 28.07.2021 12:07:50
>> An: users@nifi.apache.org, Axel Schwarz 
>> Betreff: Re: Re: No Load Balancing since 1.13.2
>>
>> > hi
>> >
>> > I can see that you have configured
>> nifi.cluster.load.balance.address=0.0.0.0
>> >
>> > Have your tried to set the correct ip adress?
>> > node1: nifi.cluster.load.balance.address=192.168.1.10
>> > node2: nifi.cluster.load.balance.address=192.168.1.11
>> > node3: nifi.cluster.load.balance.address=192.168.1.12
>> >
>> > regards
>> > Jens M. Kofoed
>> >
>> > Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz <
>> axelkop...@emailn.de>:
>> >
>> >
>> > > Just tried Java 11. But still does not work. Nothing changed. :(
>> > >
>> > > --- Ursprüngliche Nachricht ---
>> > > Von: Jorge Machado 
>> > > Datum: 27.07.2021 13:08:55
>> > > An: users@nifi.apache.org,  Axel Schwarz 
>> >
>> > > Betreff: Re: No Load Balancing since 1.13.2
>> > >
>> > > > Did you tried java 11 ? I have a client running a similar setup
>> > to yours
>> > > > but with a lower nigh version and it works fine. Maybe it is worth
>> > to try
>> > > > it.
>> > > >
>> > > >
>> > > > > On 27. Jul 2021, at 12:42, Axel Schwarz 
>> >
>> > > > wrote:
>> > > > >
>> > > > > I did indeed, but I updated from u161 to u291, as this was
>> > the newest
>> > > > version at that time, because I thought it could help.
>> > > > > So the issue started under u161. But I just saw that u301
>> > is out. I
>> > > > will try this as well.
>> > > > > --- Ursprüngliche Nachricht ---
>> > > > > Von: Pierre Villard 
>> > > > > Datum: 27.07.2021 10:18:38
>> > > > > An: users@nifi.apache.org, Axel Schwarz 
>> >
>> > > >
>> > > > > Betreff: Re: No Load Balancing since 1.13.2
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > I bel

Re: Re: Re: No Load Balancing since 1.13.2

2021-07-29 Thread Jens M. Kofoed
Hmm... I can't remember :-( sorry

My configuration for version 1.13.2 is like this:
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-node01.domaine.com
nifi.cluster.node.protocol.port=9443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3

# cluster load balancing properties #
nifi.cluster.load.balance.address=192.168.1.11
nifi.cluster.load.balance.port=6111
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

So I defined "nifi.cluster.node.address" with the hostname and not an ip
adress and the "nifi.cluster.load.balance.address" with the ip address of
the server.
And triple check the configuration at all servers :-)

Kind Regards
Jens M. Kofoed


Den tor. 29. jul. 2021 kl. 10.11 skrev Axel Schwarz :

> Hey Jens,
>
> in Issue Nifi-8643 you wrote the last comment with the exactly same
> behaviour as we're experiencing now. 2 of 3 nodes were load balancing.
> How did you get the third node to participate in load balancing? An update
> to 1.14.0 does not change anything for us.
>
>
> https://issues.apache.org/jira/browse/NIFI-8643?focusedCommentId=17361418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17361418
>
>
> --- Ursprüngliche Nachricht ---
> Von: "Jens M. Kofoed" 
> Datum: 28.07.2021 12:07:50
> An: users@nifi.apache.org, Axel Schwarz 
> Betreff: Re: Re: No Load Balancing since 1.13.2
>
> > hi
> >
> > I can see that you have configured
> nifi.cluster.load.balance.address=0.0.0.0
> >
> > Have your tried to set the correct ip adress?
> > node1: nifi.cluster.load.balance.address=192.168.1.10
> > node2: nifi.cluster.load.balance.address=192.168.1.11
> > node3: nifi.cluster.load.balance.address=192.168.1.12
> >
> > regards
> > Jens M. Kofoed
> >
> > Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz <
> axelkop...@emailn.de>:
> >
> >
> > > Just tried Java 11. But still does not work. Nothing changed. :(
> > >
> > > --- Ursprüngliche Nachricht ---
> > > Von: Jorge Machado 
> > > Datum: 27.07.2021 13:08:55
> > > An: users@nifi.apache.org,  Axel Schwarz 
> >
> > > Betreff: Re: No Load Balancing since 1.13.2
> > >
> > > > Did you tried java 11 ? I have a client running a similar setup
> > to yours
> > > > but with a lower nigh version and it works fine. Maybe it is worth
> > to try
> > > > it.
> > > >
> > > >
> > > > > On 27. Jul 2021, at 12:42, Axel Schwarz 
> >
> > > > wrote:
> > > > >
> > > > > I did indeed, but I updated from u161 to u291, as this was
> > the newest
> > > > version at that time, because I thought it could help.
> > > > > So the issue started under u161. But I just saw that u301
> > is out. I
> > > > will try this as well.
> > > > > --- Ursprüngliche Nachricht ---
> > > > > Von: Pierre Villard 
> > > > > Datum: 27.07.2021 10:18:38
> > > > > An: users@nifi.apache.org, Axel Schwarz 
> >
> > > >
> > > > > Betreff: Re: No Load Balancing since 1.13.2
> > > > >
> > > > > Hi,
> > > > >
> > > > > I believe the minor u291 is known to have issues (for some
> > of its early
> > > > builds). Did you upgrade the Java version recently?
> > > > >
> > > > > Thanks,
> > > > > Pierre
> > > > >
> > > > > Le mar. 27 juil. 2021 à 08:07, Axel Schwarz  >
> > > > <mailto:axelkop...@emailn.de>> a écrit :
> > > > > Dear Community,
> > > > >
> > > > > we're running a secured 3 node Nifi Cluster on Java 8_u291
> > and Debian
> > > > 7 and experiencing
> > > > > problems with load balancing since version 1.13.2.
> > > > >
> > > > > I'm fully aware of Issue Nifi-8643 and tested alot around
> > this, but
> > > > gotta say, that this
> > > > > is not our problem. Mainly because the balance port never
> > binds to
> > > localhost,
> >

XML Reader: Tags with Attributes not working

2021-07-28 Thread Jens M. Kofoed
Hi

I'm trying to convert some xml data, and struggle with xml attributes and
field content. Therefore I tried the example in the documentation for the
XML Reader but this doesn't work either.

By using the data and settings from the example I only get the content
value not the data from the tag.
(Another think is, that in the examples are missing a }. so you can't just
copy/past from the documentation page:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.7.0/org.apache.nifi.xml.XMLReader/additionalDetails.html
)

GenerateFlowfile -> ConvertRecord (XMLReader->JSONSetWriter)

A GenerateFlowfile process with this text:

content of
field


XMLReader Settings:
Schema Access Strategy: Use 'Schema Text' Property
Schema Text:
{   "name": "test",
"namespace": "nifi",
"type": "record",
"fields": [
{   "name": "field_with_attribute",
"type":
{   "name": "RecordForTag",
"type": "record",
"fields" : [
{ "name": "attr", "type": "string"},
{"name": "field_name_for_content", "type": "string"}
  ]
  }
  }
  ]
  }
Expect Records as Array: false
Attribute Prefix:prefix_
Field Name for Content:  field_name_for_content

The JSON Output is the following:
{
  "field_with_attribute" : {
"attr" : null,
"field_name_for_content" : "content of field"
  }
}

Kind regards
Jens M. Kofoed


Re: Need help to create avro schema for arrays with tags

2021-07-28 Thread Jens M. Kofoed
The problem is with the XML Reader converting xml tags and field contents
at the same time.
I tried to use the example: Tags with Attributes from the XML Reader
additional details help page:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-record-serialization-services-nar/1.7.0/org.apache.nifi.xml.XMLReader/additionalDetails.html
In this example I don't get the data from the xml tag field only the
content value. In my own dataset I do get the xml tag data but not the
content value.

GenerateFlowfile -> ConvertRecord (XMLReader->JSONSetWriter)

A GenerateFlowfile process with this text:


content of
field



XMLReader Settings:
Schema Access Strategy: Use 'Schema Text' Property
Schema Text:
{   "name": "test",
"namespace": "nifi",
"type": "record",
"fields": [
{   "name": "field_with_attribute",
"type":
{   "name": "RecordForTag",
"type": "record",
"fields" : [
{ "name": "attr", "type": "string"},
{"name": "field_name_for_content", "type": "string"}
  ]
  }
  }
  ]
  }
Expect Records as Array: true
Attribute Prefix:prefix_
Field Name for Content:  field_name_for_content

The JSON Output is the following:
{
  "field_with_attribute" : {
"attr" : null,
"field_name_for_content" : "content of field"
  }
}

Kind Regards
Jens M. Kofoed

Den ons. 28. jul. 2021 kl. 19.06 skrev Etienne Jouvin <
lapinoujou...@gmail.com>:

> Hello.
>
> What you can do, is to write the expected JSON.
> Then do a process with convert record, with a JSON tree reader and a
> writer. On the writer, specify that you want to write the schema in
> property.
> Like this, yo uwil have the wanted schema.
>
> Etienne
>
>
> Le mer. 28 juil. 2021 à 14:51, Jens M. Kofoed  a
> écrit :
>
>> If I use the following schema:
>> { "name":"object","type":["null",{
>> "name":"objectRecord","type":"record","fields":[{ "name":
>> objectDetails","type": {"name": "objectDetails","type": "record","fields":
>> [{ "name": additionalInfo","type": {"type": "array","items": {"name":
>> "additionalInfo","type": "record","fields": [{"name": "name","type":
>> "string"},{"name":
>> "value","type":["null","string","int","boolean"]}]}}}]}},{
>> name":"objectIdentification","type":["null",{
>> "name":"objectIdentificationRecord","type":"record","fields":[{"name":"objectId","type":["null","int"]},{"name":"objectType","type":["null","string"]}]}]}]}]}
>>
>> Is is almost there. All I'm missing is the value for the tagX fields.
>>
>> Please help
>>
>> regards
>> Jens M. Kofoed
>>
>


RE:Need help to create avro schema for arrays with tags

2021-07-28 Thread Jens M. Kofoed
If I use the following schema:
{ "name":"object","type":["null",{
"name":"objectRecord","type":"record","fields":[{ "name":
objectDetails","type": {"name": "objectDetails","type": "record","fields":
[{ "name": additionalInfo","type": {"type": "array","items": {"name":
"additionalInfo","type": "record","fields": [{"name": "name","type":
"string"},{"name":
"value","type":["null","string","int","boolean"]}]}}}]}},{
name":"objectIdentification","type":["null",{
"name":"objectIdentificationRecord","type":"record","fields":[{"name":"objectId","type":["null","int"]},{"name":"objectType","type":["null","string"]}]}]}]}]}

Is is almost there. All I'm missing is the value for the tagX fields.

Please help

regards
Jens M. Kofoed


Need help to create avro schema for arrays with tags

2021-07-28 Thread Jens M. Kofoed
Dear community

I'm struggling with transforming some xml data into a json format using an
avro schema.
The data which I can't get to work looks something like this:


value1
value2
value3


1
objType



If I set the type for additionalInfo to array, I only gets the values. I
tried to set the array items to a record. But I can't get the tag names.

My goal is to get a json like this:
"object" : {
"objectDetails" : [
{ "additionalInfo" : "tag1", "value":"value1"}
{ "additionalInfo" : "tag2", "value":"value2"}
{ "additionalInfo" : "tag3", "value":"value3"}
],
"objectIdentification" : {
  "objectId" : 1,
  "objectType" : " objType  "
}
  }

or

"object" : {
"objectDetails" : {
"additionalInfo" : [
 {"name":"tag1", "value":"value1"},
 { "name":"tag2", "value":"value2"},
 { "name":"tag3", "value":"value3"}
]
},
"objectIdentification" : {
  "objectId" : 1,
  "objectType" : " objType  "
}
  }

Kind regards
Jens M. Kofoed


Re: Re: No Load Balancing since 1.13.2

2021-07-28 Thread Jens M. Kofoed
hi

I can see that you have configured nifi.cluster.load.balance.address=0.0.0.0
Have your tried to set the correct ip adress?
node1: nifi.cluster.load.balance.address=192.168.1.10
node2: nifi.cluster.load.balance.address=192.168.1.11
node3: nifi.cluster.load.balance.address=192.168.1.12

regards
Jens M. Kofoed

Den ons. 28. jul. 2021 kl. 11.17 skrev Axel Schwarz :

> Just tried Java 11. But still does not work. Nothing changed. :(
>
> --- Ursprüngliche Nachricht ---
> Von: Jorge Machado 
> Datum: 27.07.2021 13:08:55
> An: users@nifi.apache.org,  Axel Schwarz 
> Betreff: Re: No Load Balancing since 1.13.2
>
> > Did you tried java 11 ? I have a client running a similar setup to yours
> > but with a lower nigh version and it works fine. Maybe it is worth to try
> > it.
> >
> >
> > > On 27. Jul 2021, at 12:42, Axel Schwarz 
> > wrote:
> > >
> > > I did indeed, but I updated from u161 to u291, as this was the newest
> > version at that time, because I thought it could help.
> > > So the issue started under u161. But I just saw that u301 is out. I
> > will try this as well.
> > > --- Ursprüngliche Nachricht ---
> > > Von: Pierre Villard 
> > > Datum: 27.07.2021 10:18:38
> > > An: users@nifi.apache.org, Axel Schwarz 
> >
> > > Betreff: Re: No Load Balancing since 1.13.2
> > >
> > > Hi,
> > >
> > > I believe the minor u291 is known to have issues (for some of its early
> > builds). Did you upgrade the Java version recently?
> > >
> > > Thanks,
> > > Pierre
> > >
> > > Le mar. 27 juil. 2021 à 08:07, Axel Schwarz  > <mailto:axelkop...@emailn.de>> a écrit :
> > > Dear Community,
> > >
> > > we're running a secured 3 node Nifi Cluster on Java 8_u291 and Debian
> > 7 and experiencing
> > > problems with load balancing since version 1.13.2.
> > >
> > > I'm fully aware of Issue Nifi-8643 and tested alot around this, but
> > gotta say, that this
> > > is not our problem. Mainly because the balance port never binds to
> localhost,
> > but also because I
> > > implemented all workarounds under version 1.13.2 and even tried version
> > 1.14.0 by now,
> > > but load blancing still does not work.
> > > What we experience is best described as "the primary node balances
> > with itself"...
> > >
> > > So what it does is, opening the balancing connections to its own IP
> > instead of the IPs
> > > of the other two nodes. And the other two nodes don't open balancing
> > connections at all.
> > >
> > > When executing "ss | grep 6342" on the primary node, this
> > is what it looks like:
> > >
> > > [root@nifiHost1 conf]# ss | grep 6342
> > > tcpESTAB  0  0  192.168.1.10:51380 <
> http://192.168.1.10:51380/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51376 <
> http://192.168.1.10:51376/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51378 <
> http://192.168.1.10:51378/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51370 <
> http://192.168.1.10:51370/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51372 <
> http://192.168.1.10:51372/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:6342 <
> http://192.168.1.10:6342/>
> > 192.168.1.10:51376 <http://192.168.1.10:51376/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51374 <
> http://192.168.1.10:51374/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:6342 <
> http://192.168.1.10:6342/>
> > 192.168.1.10:51374 <http://192.168.1.10:51374/>
> >
> > > tcpESTAB  0  0  192.168.1.10:51366 <
> http://192.168.1.10:51366/>
> >192.168.1.10:6342 <http://192.168.1.10:6342/>
> >
> > > tcpESTAB  0  0  192.168.1.10:6342 <
> http://192.168.1.10:6342/>
> > 192.168.1.10:51370 <http://192.168.1.10:51370/>
> >
&

Re: Provenance for SplitRecord has a wrong output claim file

2021-07-28 Thread Jens M. Kofoed
Thanks Pierre

your are right, I didn't think of that. what should the output claim file
be if there are multiple outputs. The output claim file is off course the
original file :-) silly me

regards
Jens M. Kofoed

Den ons. 28. jul. 2021 kl. 11.10 skrev Pierre Villard <
pierre.villard...@gmail.com>:

> Hi,
>
> I believe this is expected. If you have one XML file split into 10 JSON
> files, what would you expect for the output claim?
> You can use the provenance event to get the child flow files, and retrieve
> the claims from there.
> Also note that one claim file can contain multiple flow files but with
> different offsets.
>
> Hope this helps,
> Pierre
>
> Le mer. 28 juil. 2021 à 10:18, Jens M. Kofoed  a
> écrit :
>
>> Hi
>>
>> I use a SplitRecord with a XMLReader and a JSONRecord Set writer. When
>> looking at provenance date for the process the output claim file is equal
>> to the input claim file which mean that both files are xml files. This is
>> wrong, the output claim file should be the json file.
>> If using a convertRecord process the output claim file is equal to the
>> file coming out of the process
>>
>> Kind regards
>> Jens M. Kofoed
>>
>


Provenance for SplitRecord has a wrong output claim file

2021-07-28 Thread Jens M. Kofoed
Hi

I use a SplitRecord with a XMLReader and a JSONRecord Set writer. When
looking at provenance date for the process the output claim file is equal
to the input claim file which mean that both files are xml files. This is
wrong, the output claim file should be the json file.
If using a convertRecord process the output claim file is equal to the file
coming out of the process

Kind regards
Jens M. Kofoed


Re: NiFi Queue Monitoring

2021-07-27 Thread Jens M. Kofoed
Why not using the NiFi wiki page at confluence???
https://cwiki.apache.org/confluence/display/NIFI
There are so many great people that has made many wonderful blogs about
NiFi. But for new users it is a nightmare, so find them all. I think it
would be great if many of the wonderful tips and guides could be added to
the wiki. If not direct copied to the wiki, at least with a link.

regards
Jens M. Kofoed

Den ons. 28. jul. 2021 kl. 00.15 skrev Matt Burgess :

> I’m planning on doing one all about QueryNiFiReportingTask and the
> RecordSinks, I can include this use case if you like, but would definitely
> encourage you to blog it as well :) my blog is at
> https://funnifi.blogspot.com as an example, there are many others as well.
>
> Regards,
> Matt
>
> On Jul 27, 2021, at 5:17 PM, scott  wrote:
>
> 
> Joe,
> I'm not sure. What would be involved? I'm not familiar with a NiFi blog,
> can you point me to some examples?
>
> Thanks,
> Scott
>
> On Tue, Jul 27, 2021 at 10:00 AM Joe Witt  wrote:
>
>> Scott
>>
>> This sounds pretty darn cool.  Any chance you'd be interested in
>> kicking out a blog on it?
>>
>> Thanks
>>
>> On Tue, Jul 27, 2021 at 9:58 AM scott  wrote:
>> >
>> > Matt/all,
>> > I was able to solve my problem using the QueryNiFiReportingTask with
>> "SELECT * FROM CONNECTION_STATUS WHERE isBackPressureEnabled = true" and
>> the new LoggingRecordSink as you suggested. Everything is working
>> flawlessly now. Thank you again!
>> >
>> > Scott
>> >
>> > On Wed, Jul 21, 2021 at 5:09 PM Matt Burgess 
>> wrote:
>> >>
>> >> Scott,
>> >>
>> >> Glad to hear it! Please let me know if you have any questions or if
>> >> issues arise. One thing I forgot to mention is that I think
>> >> backpressure prediction is disabled by default due to the extra
>> >> consumption of CPU to do the regressions, make sure the
>> >> "nifi.analytics.predict.enabled" property in nifi.properties is set to
>> >> "true" before starting NiFi.
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >> On Wed, Jul 21, 2021 at 7:21 PM scott  wrote:
>> >> >
>> >> > Excellent! Very much appreciate the help and for setting me on the
>> right path. I'll give the queryNiFiReportingTask code a try.
>> >> >
>> >> > Scott
>> >> >
>> >> > On Wed, Jul 21, 2021 at 3:26 PM Matt Burgess 
>> wrote:
>> >> >>
>> >> >> Scott et al,
>> >> >>
>> >> >> There are a number of options for monitoring flows, including
>> >> >> backpressure and even backpressure prediction:
>> >> >>
>> >> >> 1) The REST API for metrics. As you point out, it's subject to the
>> >> >> same authz/authn as any other NiFi operation and doesn't sound like
>> it
>> >> >> will work out for you.
>> >> >> 2) The Prometheus scrape target via the REST API. The issue would be
>> >> >> the same as #1 I presume.
>> >> >> 3) PrometheusReportingTask. This is similar to the REST scrape
>> target
>> >> >> but isn't subject to the usual NiFi authz/authn stuff, however it
>> does
>> >> >> support SSL/TLS for a secure solution (and is also a "pull" approach
>> >> >> despite it being a reporting task)
>> >> >> 4) QueryNiFiReportingTask. This is not included with the NiFi
>> >> >> distribution but can be downloaded separately, the latest version
>> >> >> (1.14.0) is at [1]. I believe this is what Andrew was referring to
>> >> >> when he mentioned being able to run SQL queries over the
>> information,
>> >> >> you can do something like "SELECT * FROM
>> CONNECTION_STATUS_PREDICTIONS
>> >> >> WHERE predictedTimeToBytesBackpressureMillis < 1". This can be
>> >> >> done either as a push or pull depending on the Record Sink you
>> choose.
>> >> >> A SiteToSiteReportingRecordSink, KafkaRecordSink, or
>> LoggingRecordSink
>> >> >> results in a push (to NiFi, Kafka, or nifi-app.log respectively),
>> >> >> where a PrometheusRecordSink results in a pull the same as #2 and
>> #3.
>> >> >> There's even a ScriptedRecordSink where you can write your own
>> script
>&g

Re: Using a corporate SSL signed certificate

2021-06-15 Thread Jens M. Kofoed
Hi Emmanuel

I don't use the toolkit, I just do it manually.
I have found that a normal server certificate, generated i Microsoft
Windows is not working. The certificate for NiFi servers has to be both
serverAuth and clientAuth. So I have created a new certificate profil in
our PKI server for NiFi servers.
Next I create a server certificate for node1, with the following settings
Common name = node1.domain.net
alternative names:
dns = node1.domain.net
dns = clustername.domain.net

When I export the certificate (as a pfx file) I export it with the private
key and protect it with a password. I all so export my CA and ICA
certificates and copies all the the node1 server
To create the keystore file I use the following command:
keytool -importkeystore -destkeystore keystore.jks -srcstoretype PKCS12
-deststoretype jks -srckeystore node1.domain.net.pfx
Here you will have to provide the password for the certificate and set a
password for the keystore. I use the same password for both.

To create a truststore I use the folloing commands:
keytool -keystore truststore.jks -storetype jks -importcert
-trustcacerts -file CA.domain.net.cer -alias CA-DOMAIN
keytool -keystore truststore.jks -importcert -file ICA.domain.net.cer
-alias ICA-DOMAIN
You will have to provide a password for the truststore.

Now you will have to manually edit the nifi.properties file for the path to
the files and the passwords.

Just repeat the steps above for the other nodes. Keep in mind if you later
will use a StandardSSLContextService and use the keystore on each node, the
password for the certificate and keystores has to be the same.

For accessing my secure nifi cluster afterwards, I simple create a user
certificate to my self in windows. and configure the authorizers.xml with
the certificate name "CERTIFICATE"
Keep in mind that NiFi is case sentitive. Therefore I use identity mappings
in the nifi.properties file
nifi.security.identity.mapping.pattern.dn=^(.*)$
nifi.security.identity.mapping.value.dn=$1
nifi.security.identity.mapping.transform.dn=LOWER

This works fine for my.

Kind regards
Jens M. Kofoed


Den man. 14. jun. 2021 kl. 15.39 skrev QUEVILLON EMMANUEL - EXT-SAFRAN
ENGINEERING SERVICES (SAFRAN) :

> Hi list,
>
>
>
> We are trying to set a nifi secure installation using a SSL singed
> certificate by our corporate CA.
>
> This SSL certificate is signed for a domain name we’d like to use to
> access our nifi server(s).
>
> We’ve been unable to create a new certificate for our server using
> tls-toolkit for the main admin user identity to connect.
>
> 1)  Is it possible to use such SSL signed certificate to create a new
> one with tls-toolkit?
>
>
>
> We’ve followed this documentation
> https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#tls_intermediate_ca
> and copied respective files and key to the right location and ran
> tls-toolkit command. However, tls-toolkit throw error complaining “The
> signing certificate was not signed by any known certificates
>
> ”
>
>
>
> We’ve also tried with the full chain certificate as an additional
> certificate file (option –additionalCACertificate), but it looks like
> tls-toolkit does not find all the certificate chain and stop at first level
> of the chain.
>
>
>
> Is anyone faced the same problematic?
>
> Any help or advice will be appreciated.
>
>
>
> Thanks, regards
>
>
>
> Emmanuel
>
> C2 - Restricted
>
>
> #
> " Ce courriel et les documents qui lui sont joints peuvent contenir des
> informations confidentielles, être soumis aux règlementations relatives au
> contrôle des exportations ou ayant un caractère privé. S'ils ne vous sont
> pas destinés, nous vous signalons qu'il est strictement interdit de les
> divulguer, de les reproduire ou d'en utiliser de quelque manière que ce
> soit le contenu. Toute exportation ou réexportation non autorisée est
> interdite Si ce message vous a été transmis par erreur, merci d'en informer
> l'expéditeur et de supprimer immédiatement de votre système informatique ce
> courriel ainsi que tous les documents qui y sont attachés."
> **
> " This e-mail and any attached documents may contain confidential or
> proprietary information and may be subject to export control laws and
> regulations. If you are not the intended recipient, you are notified that
> any dissemination, copying of this e-mail and any attachments thereto or
> use of their contents by any means whatsoever is strictly prohibited.
> Unauthorized export or re-export is prohibited. If you have received this
> e-mail in error, please advise the sender immediately and delete this
> e-mail and all attached documents from your computer system."
> #
>


Re: BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-11 Thread Jens M. Kofoed
  This is a bug. There are a mismatch between documentation,
nifi.properties and the java class handling the properties
https://issues.apache.org/jira/browse/NIFI-8643
kind regards
Jens

Den fre. 11. jun. 2021 kl. 07.51 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Adding this line in the nifi.properties gets it to work:
> nifi.cluster.load.balance.address=0.0.0.0
>
> There are a mismatch between nifi.properties and the java file handling
> the properties
> nifi.properties:  nifi.cluster.load.balance.host
> nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java:
> nifi.cluster.load.balance.address
>
> kind regards
> Jens
>
> Den fre. 11. jun. 2021 kl. 06.54 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> I found out that another user has reported the same bug in JIRA:
>> https://issues.apache.org/jira/browse/NIFI-8643
>>
>> so I'm not the only one, seeing this problem.
>> Kind regards
>> Jens
>>
>> Den tor. 10. jun. 2021 kl. 20.08 skrev Jens M. Kofoed <
>> jmkofoed@gmail.com>:
>>
>>> I have also tried to not specify any load balance host:
>>> nifi.cluster.load.balance.host=
>>>
>>> In the documentation is says: "If not specified, will default to the
>>> value used by the nifi.cluster.node.address property." and this port is
>>> working and bind to 0.0.0.0
>>>
>>> Kind regards
>>> Jens
>>>
>>> Den tor. 10. jun. 2021 kl. 19.51 skrev Jens M. Kofoed <
>>> jmkofoed@gmail.com>:
>>>
>>>> Joe, Sorry if there is a space in the mail. There is no spaces in the
>>>> original config. I had to change the original address with an anonymous
>>>>
>>>> /jens
>>>>
>>>> Den 10. jun. 2021 kl. 19.31 skrev Joe Gresock :
>>>>
>>>> Jens,
>>>>
>>>> Can you try removing the space from the nifi.cluster.load.balance.host
>>>> property and see what happens?
>>>>
>>>> On Thu, Jun 10, 2021 at 1:23 PM Jens M. Kofoed 
>>>> wrote:
>>>>
>>>>> Dear Community
>>>>>
>>>>> I have installed and configured a 3 node secured NiFi cluster with
>>>>> NiFi 1.13.2, Java 8 on Ubuntu 20.04.
>>>>> I was wondering why the cluster didn't load balance flowfiles after I
>>>>> configured Round Robins between a ListFTP and FetchFTP Process. (Other
>>>>> mails earlier today: Round Robin not working NiFi)
>>>>> After many attempt to find and fix the issues I notes that the load
>>>>> balance port 6342 was bind to localhost and not 0.0.0.0.
>>>>>
>>>>> > netstat -l
>>>>> Active Internet connections (only servers)
>>>>> Proto Recv-Q Send-Q Local Address   Foreign Address
>>>>> State
>>>>> tcp0  0 0.0.0.0:90900.0.0.0:*
>>>>> LISTEN
>>>>> tcp0  0 0.0.0.0:94430.0.0.0:*
>>>>> LISTEN
>>>>> tcp0  0 localhost:6342  0.0.0.0:*
>>>>> LISTEN
>>>>> tcp0  0 localhost:42603 0.0.0.0:*
>>>>> LISTEN
>>>>> tcp0  0 0.0.0.0:ssh 0.0.0.0:*
>>>>> LISTEN
>>>>> tcp0  0 node01.domain.com:8443 0.0.0.0:*
>>>>> LISTEN
>>>>> tcp6   0  0 localhost:42101 [::]:*
>>>>>  LISTEN
>>>>> tcp6   0  0 [::]:ssh[::]:*
>>>>>  LISTEN
>>>>> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>>>>>
>>>>> Part of the configuration is like this:
>>>>> nifi.web.https.host=node1.domain.com
>>>>> nifi.web.https.port=8443
>>>>> nifi.web.https.network.interface.default=ens192
>>>>> nifi.cluster.is.node=true
>>>>> nifi.cluster.node.address=node01.domain.com
>>>>> nifi.cluster.node.protocol.port=9443
>>>>> nifi.cluster.node.protocol.threads=10
>>>>> nifi.cluster.node.protocol.max.threads=50
>>>>> nifi.cluster.node.event.history.size=25
>>>>> nifi.cluster.node.connection.timeout=5 sec
>>>>> nifi.cluster.node.read.timeout=5 sec
>>>>> nifi.cluster.node.max.concurrent.requests=100
>>>>> nifi.cluster.firewall.file=
>>>>> nifi.cluster.flow.election.max.wait.ti

Re: Round Robin not working NiFi 1.13.2

2021-06-11 Thread Jens M. Kofoed
This is a bug. There are a mismatch between documentation, nifi.properties
and the java class handling the properties
https://issues.apache.org/jira/browse/NIFI-8643
kind regards
Jens

Den tor. 10. jun. 2021 kl. 17.01 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> In the beginning both parameters was set:
>
> nifi.cluster.is.node=true
> nifi.cluster.node.address=node01.domain.com
> nifi.cluster.node.protocol.port=9443
> nifi.cluster.node.protocol.threads=10
> nifi.cluster.node.protocol.max.threads=50
> nifi.cluster.node.event.history.size=25
> nifi.cluster.node.connection.timeout=5 sec
> nifi.cluster.node.read.timeout=5 sec
> nifi.cluster.node.max.concurrent.requests=100
> nifi.cluster.firewall.file=
> nifi.cluster.flow.election.max.wait.time=5 mins
> nifi.cluster.flow.election.max.candidates=3
>
> # cluster load balancing properties #
> nifi.cluster.load.balance.host= node01.domain.com
> nifi.cluster.load.balance.port=6342
> nifi.cluster.load.balance.connections.per.node=4
> nifi.cluster.load.balance.max.thread.count=8
> nifi.cluster.load.balance.comms.timeout=30 sec
>
> If have testet multiple combination of host names. the "funny" part is
> that port 9443 binds to 0.0.0.0:
> netstat -l
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State
> tcp0  0 0.0.0.0:90900.0.0.0:*   LISTEN
> tcp0  0 0.0.0.0:94430.0.0.0:*   LISTEN
> tcp0  0 localhost:6342  0.0.0.0:*   LISTEN
> tcp0  0 localhost:42603 0.0.0.0:*   LISTEN
> tcp0  0 0.0.0.0:ssh 0.0.0.0:*   LISTEN
> tcp0  0 node01.domain.com:8443 0.0.0.0:*   LISTEN
> tcp6   0  0 localhost:42101 [::]:*  LISTEN
> tcp6   0  0 [::]:ssh[::]:*  LISTEN
> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>
>
> regards
> Jens
>
> Den tor. 10. jun. 2021 kl. 16.53 skrev Joe Gresock :
>
>> Looking at the code, it appears that if nifi.cluster.load.balance.address
>> is not set, it falls back to choosing nifi.cluster.node.address.  If this
>> is not provided, it finally falls back to localhost.
>>
>> I'd recommend setting nifi.cluster.node.address at minimum, and you might
>> as well also set nifi.cluster.load.balance.address in order to be explicit.
>>
>> On Thu, Jun 10, 2021 at 10:45 AM Jens M. Kofoed 
>> wrote:
>>
>>> Hi Joe
>>>
>>> I just found out that port 6342 is bound to localhost. Why
>>> In the last build NIFI is bound to localhost as standard if not
>>> specifying which interface to use:
>>> nifi.web.https.host=node1.domain.com
>>> nifi.web.https.port=8443
>>> nifi.web.https.network.interface.default=ens192<- If this is not
>>> configured the UI is bound to localhost.
>>>
>>> But how can I configure port 6342 to bound to any interface???
>>>
>>> kind regards
>>> Jens
>>>
>>>
>>> Den tor. 10. jun. 2021 kl. 16.32 skrev Joe Gresock :
>>>
>>>> Ok, and just to confirm, you've verified that each node can talk to the
>>>> others over port 6342?
>>>>
>>>> On Thu, Jun 10, 2021 at 10:29 AM Jens M. Kofoed 
>>>> wrote:
>>>>
>>>>> I have the same error for node2 as well.
>>>>> All 3 nodes can talk to each other. If I use a remote process group
>>>>> and connect to an "remote" input port, everything works fine. This is a
>>>>> work around for round robin.
>>>>> My configuration for cluster load balance is the default.
>>>>> nifi.cluster.load.balance.host=
>>>>> nifi.cluster.load.balance.port=6342
>>>>> nifi.cluster.load.balance.connections.per.node=4
>>>>> nifi.cluster.load.balance.max.thread.count=8
>>>>> nifi.cluster.load.balance.comms.timeout=30 sec
>>>>>
>>>>> kind regards
>>>>> Jens
>>>>>
>>>>>
>>>>> Den tor. 10. jun. 2021 kl. 16.18 skrev Joe Gresock >>>> >:
>>>>>
>>>>>> That would seem to be the culprit :)  It sounds like your other nodes
>>>>>> can't connect to node3 over port 8443.  Have you verified that the port 
>>>>>> is
>>>>>> open?  Same question for all other ports configured in you

Re: BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-10 Thread Jens M. Kofoed
Adding this line in the nifi.properties gets it to work:
nifi.cluster.load.balance.address=0.0.0.0

There are a mismatch between nifi.properties and the java file handling the
properties
nifi.properties:  nifi.cluster.load.balance.host
nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java:
nifi.cluster.load.balance.address

kind regards
Jens

Den fre. 11. jun. 2021 kl. 06.54 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> I found out that another user has reported the same bug in JIRA:
> https://issues.apache.org/jira/browse/NIFI-8643
>
> so I'm not the only one, seeing this problem.
> Kind regards
> Jens
>
> Den tor. 10. jun. 2021 kl. 20.08 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> I have also tried to not specify any load balance host:
>> nifi.cluster.load.balance.host=
>>
>> In the documentation is says: "If not specified, will default to the
>> value used by the nifi.cluster.node.address property." and this port is
>> working and bind to 0.0.0.0
>>
>> Kind regards
>> Jens
>>
>> Den tor. 10. jun. 2021 kl. 19.51 skrev Jens M. Kofoed <
>> jmkofoed@gmail.com>:
>>
>>> Joe, Sorry if there is a space in the mail. There is no spaces in the
>>> original config. I had to change the original address with an anonymous
>>>
>>> /jens
>>>
>>> Den 10. jun. 2021 kl. 19.31 skrev Joe Gresock :
>>>
>>> Jens,
>>>
>>> Can you try removing the space from the nifi.cluster.load.balance.host
>>> property and see what happens?
>>>
>>> On Thu, Jun 10, 2021 at 1:23 PM Jens M. Kofoed 
>>> wrote:
>>>
>>>> Dear Community
>>>>
>>>> I have installed and configured a 3 node secured NiFi cluster with NiFi
>>>> 1.13.2, Java 8 on Ubuntu 20.04.
>>>> I was wondering why the cluster didn't load balance flowfiles after I
>>>> configured Round Robins between a ListFTP and FetchFTP Process. (Other
>>>> mails earlier today: Round Robin not working NiFi)
>>>> After many attempt to find and fix the issues I notes that the load
>>>> balance port 6342 was bind to localhost and not 0.0.0.0.
>>>>
>>>> > netstat -l
>>>> Active Internet connections (only servers)
>>>> Proto Recv-Q Send-Q Local Address   Foreign Address
>>>> State
>>>> tcp0  0 0.0.0.0:90900.0.0.0:*
>>>> LISTEN
>>>> tcp0  0 0.0.0.0:94430.0.0.0:*
>>>> LISTEN
>>>> tcp0  0 localhost:6342  0.0.0.0:*
>>>> LISTEN
>>>> tcp0  0 localhost:42603 0.0.0.0:*
>>>> LISTEN
>>>> tcp0  0 0.0.0.0:ssh 0.0.0.0:*
>>>> LISTEN
>>>> tcp0  0 node01.domain.com:8443 0.0.0.0:*
>>>> LISTEN
>>>> tcp6   0  0 localhost:42101 [::]:*
>>>>  LISTEN
>>>> tcp6   0  0 [::]:ssh[::]:*
>>>>  LISTEN
>>>> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>>>>
>>>> Part of the configuration is like this:
>>>> nifi.web.https.host=node1.domain.com
>>>> nifi.web.https.port=8443
>>>> nifi.web.https.network.interface.default=ens192
>>>> nifi.cluster.is.node=true
>>>> nifi.cluster.node.address=node01.domain.com
>>>> nifi.cluster.node.protocol.port=9443
>>>> nifi.cluster.node.protocol.threads=10
>>>> nifi.cluster.node.protocol.max.threads=50
>>>> nifi.cluster.node.event.history.size=25
>>>> nifi.cluster.node.connection.timeout=5 sec
>>>> nifi.cluster.node.read.timeout=5 sec
>>>> nifi.cluster.node.max.concurrent.requests=100
>>>> nifi.cluster.firewall.file=
>>>> nifi.cluster.flow.election.max.wait.time=5 mins
>>>> nifi.cluster.flow.election.max.candidates=3
>>>>
>>>> # cluster load balancing properties #
>>>> nifi.cluster.load.balance.host= node01.domain.com
>>>> nifi.cluster.load.balance.port=6342
>>>> nifi.cluster.load.balance.connections.per.node=4
>>>> nifi.cluster.load.balance.max.thread.count=8
>>>> nifi.cluster.load.balance.comms.timeout=30 sec
>>>>
>>>> Errors in nifi-app.log
>>>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>>>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>>> Unable to connect to node3.domain.com:8443 for load balancing
>>>> java.net.ConnectException: Connection refused
>>>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>>>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>>> Unable to connect to node2.domain.com:8443 for load balancing
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> The question is:
>>>> Why does the error messages say: Unable to connect to
>>>> node2.domain.com:8443 for load balancing
>>>> Shouldn't it be port 6342?
>>>> Why does this port bind to localhost while all other ports bind to
>>>> 0.0.0.0 or node01.domain.com???
>>>>
>>>> kind regards
>>>> Jens
>>>>
>>>


Re: BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-10 Thread Jens M. Kofoed
I found out that another user has reported the same bug in JIRA:
https://issues.apache.org/jira/browse/NIFI-8643

so I'm not the only one, seeing this problem.
Kind regards
Jens

Den tor. 10. jun. 2021 kl. 20.08 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> I have also tried to not specify any load balance host:
> nifi.cluster.load.balance.host=
>
> In the documentation is says: "If not specified, will default to the value
> used by the nifi.cluster.node.address property." and this port is working
> and bind to 0.0.0.0
>
> Kind regards
> Jens
>
> Den tor. 10. jun. 2021 kl. 19.51 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> Joe, Sorry if there is a space in the mail. There is no spaces in the
>> original config. I had to change the original address with an anonymous
>>
>> /jens
>>
>> Den 10. jun. 2021 kl. 19.31 skrev Joe Gresock :
>>
>> Jens,
>>
>> Can you try removing the space from the nifi.cluster.load.balance.host
>> property and see what happens?
>>
>> On Thu, Jun 10, 2021 at 1:23 PM Jens M. Kofoed 
>> wrote:
>>
>>> Dear Community
>>>
>>> I have installed and configured a 3 node secured NiFi cluster with NiFi
>>> 1.13.2, Java 8 on Ubuntu 20.04.
>>> I was wondering why the cluster didn't load balance flowfiles after I
>>> configured Round Robins between a ListFTP and FetchFTP Process. (Other
>>> mails earlier today: Round Robin not working NiFi)
>>> After many attempt to find and fix the issues I notes that the load
>>> balance port 6342 was bind to localhost and not 0.0.0.0.
>>>
>>> > netstat -l
>>> Active Internet connections (only servers)
>>> Proto Recv-Q Send-Q Local Address   Foreign Address State
>>> tcp0  0 0.0.0.0:90900.0.0.0:*
>>> LISTEN
>>> tcp0  0 0.0.0.0:94430.0.0.0:*
>>> LISTEN
>>> tcp0  0 localhost:6342  0.0.0.0:*
>>> LISTEN
>>> tcp0  0 localhost:42603 0.0.0.0:*
>>> LISTEN
>>> tcp0  0 0.0.0.0:ssh 0.0.0.0:*
>>> LISTEN
>>> tcp0  0 node01.domain.com:8443 0.0.0.0:*
>>> LISTEN
>>> tcp6   0  0 localhost:42101 [::]:*
>>>  LISTEN
>>> tcp6   0  0 [::]:ssh[::]:*
>>>  LISTEN
>>> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>>>
>>> Part of the configuration is like this:
>>> nifi.web.https.host=node1.domain.com
>>> nifi.web.https.port=8443
>>> nifi.web.https.network.interface.default=ens192
>>> nifi.cluster.is.node=true
>>> nifi.cluster.node.address=node01.domain.com
>>> nifi.cluster.node.protocol.port=9443
>>> nifi.cluster.node.protocol.threads=10
>>> nifi.cluster.node.protocol.max.threads=50
>>> nifi.cluster.node.event.history.size=25
>>> nifi.cluster.node.connection.timeout=5 sec
>>> nifi.cluster.node.read.timeout=5 sec
>>> nifi.cluster.node.max.concurrent.requests=100
>>> nifi.cluster.firewall.file=
>>> nifi.cluster.flow.election.max.wait.time=5 mins
>>> nifi.cluster.flow.election.max.candidates=3
>>>
>>> # cluster load balancing properties #
>>> nifi.cluster.load.balance.host= node01.domain.com
>>> nifi.cluster.load.balance.port=6342
>>> nifi.cluster.load.balance.connections.per.node=4
>>> nifi.cluster.load.balance.max.thread.count=8
>>> nifi.cluster.load.balance.comms.timeout=30 sec
>>>
>>> Errors in nifi-app.log
>>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>> Unable to connect to node3.domain.com:8443 for load balancing
>>> java.net.ConnectException: Connection refused
>>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>> Unable to connect to node2.domain.com:8443 for load balancing
>>> java.net.ConnectException: Connection refused
>>>
>>> The question is:
>>> Why does the error messages say: Unable to connect to
>>> node2.domain.com:8443 for load balancing
>>> Shouldn't it be port 6342?
>>> Why does this port bind to localhost while all other ports bind to
>>> 0.0.0.0 or node01.domain.com???
>>>
>>> kind regards
>>> Jens
>>>
>>


Re: BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-10 Thread Jens M. Kofoed
I have also tried to not specify any load balance host:
nifi.cluster.load.balance.host=

In the documentation is says: "If not specified, will default to the value
used by the nifi.cluster.node.address property." and this port is working
and bind to 0.0.0.0

Kind regards
Jens

Den tor. 10. jun. 2021 kl. 19.51 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Joe, Sorry if there is a space in the mail. There is no spaces in the
> original config. I had to change the original address with an anonymous
>
> /jens
>
> Den 10. jun. 2021 kl. 19.31 skrev Joe Gresock :
>
> Jens,
>
> Can you try removing the space from the nifi.cluster.load.balance.host
> property and see what happens?
>
> On Thu, Jun 10, 2021 at 1:23 PM Jens M. Kofoed 
> wrote:
>
>> Dear Community
>>
>> I have installed and configured a 3 node secured NiFi cluster with NiFi
>> 1.13.2, Java 8 on Ubuntu 20.04.
>> I was wondering why the cluster didn't load balance flowfiles after I
>> configured Round Robins between a ListFTP and FetchFTP Process. (Other
>> mails earlier today: Round Robin not working NiFi)
>> After many attempt to find and fix the issues I notes that the load
>> balance port 6342 was bind to localhost and not 0.0.0.0.
>>
>> > netstat -l
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address   Foreign Address State
>> tcp0  0 0.0.0.0:90900.0.0.0:*
>> LISTEN
>> tcp0  0 0.0.0.0:94430.0.0.0:*
>> LISTEN
>> tcp0  0 localhost:6342  0.0.0.0:*
>> LISTEN
>> tcp0  0 localhost:42603 0.0.0.0:*
>> LISTEN
>> tcp0  0 0.0.0.0:ssh 0.0.0.0:*
>> LISTEN
>> tcp0  0 node01.domain.com:8443 0.0.0.0:*   LISTEN
>> tcp6   0  0 localhost:42101 [::]:*  LISTEN
>> tcp6   0  0 [::]:ssh[::]:*  LISTEN
>> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>>
>> Part of the configuration is like this:
>> nifi.web.https.host=node1.domain.com
>> nifi.web.https.port=8443
>> nifi.web.https.network.interface.default=ens192
>> nifi.cluster.is.node=true
>> nifi.cluster.node.address=node01.domain.com
>> nifi.cluster.node.protocol.port=9443
>> nifi.cluster.node.protocol.threads=10
>> nifi.cluster.node.protocol.max.threads=50
>> nifi.cluster.node.event.history.size=25
>> nifi.cluster.node.connection.timeout=5 sec
>> nifi.cluster.node.read.timeout=5 sec
>> nifi.cluster.node.max.concurrent.requests=100
>> nifi.cluster.firewall.file=
>> nifi.cluster.flow.election.max.wait.time=5 mins
>> nifi.cluster.flow.election.max.candidates=3
>>
>> # cluster load balancing properties #
>> nifi.cluster.load.balance.host= node01.domain.com
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> Errors in nifi-app.log
>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>> Unable to connect to node3.domain.com:8443 for load balancing
>> java.net.ConnectException: Connection refused
>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>> Unable to connect to node2.domain.com:8443 for load balancing
>> java.net.ConnectException: Connection refused
>>
>> The question is:
>> Why does the error messages say: Unable to connect to
>> node2.domain.com:8443 for load balancing
>> Shouldn't it be port 6342?
>> Why does this port bind to localhost while all other ports bind to
>> 0.0.0.0 or node01.domain.com???
>>
>> kind regards
>> Jens
>>
>


Re: BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-10 Thread Jens M. Kofoed
Joe, Sorry if there is a space in the mail. There is no spaces in the original 
config. I had to change the original address with an anonymous 

/jens 

> Den 10. jun. 2021 kl. 19.31 skrev Joe Gresock :
> 
> Jens,
> 
> Can you try removing the space from the nifi.cluster.load.balance.host 
> property and see what happens?
> 
>> On Thu, Jun 10, 2021 at 1:23 PM Jens M. Kofoed  
>> wrote:
>> Dear Community
>> 
>> I have installed and configured a 3 node secured NiFi cluster with NiFi 
>> 1.13.2, Java 8 on Ubuntu 20.04.
>> I was wondering why the cluster didn't load balance flowfiles after I 
>> configured Round Robins between a ListFTP and FetchFTP Process. (Other mails 
>> earlier today: Round Robin not working NiFi)
>> After many attempt to find and fix the issues I notes that the load balance 
>> port 6342 was bind to localhost and not 0.0.0.0.
>> 
>> > netstat -l
>> Active Internet connections (only servers)
>> Proto Recv-Q Send-Q Local Address   Foreign Address State
>> tcp0  0 0.0.0.0:90900.0.0.0:*   LISTEN
>> tcp0  0 0.0.0.0:94430.0.0.0:*   LISTEN
>> tcp0  0 localhost:6342  0.0.0.0:*   LISTEN
>> tcp0  0 localhost:42603 0.0.0.0:*   LISTEN
>> tcp0  0 0.0.0.0:ssh 0.0.0.0:*   LISTEN
>> tcp0  0 node01.domain.com:8443 0.0.0.0:*   LISTEN
>> tcp6   0  0 localhost:42101 [::]:*  LISTEN
>> tcp6   0  0 [::]:ssh[::]:*  LISTEN
>> raw6   0  0 [::]:ipv6-icmp  [::]:*  7
>> 
>> Part of the configuration is like this:
>> nifi.web.https.host=node1.domain.com
>> nifi.web.https.port=8443
>> nifi.web.https.network.interface.default=ens192 
>> nifi.cluster.is.node=true
>> nifi.cluster.node.address=node01.domain.com
>> nifi.cluster.node.protocol.port=9443
>> nifi.cluster.node.protocol.threads=10
>> nifi.cluster.node.protocol.max.threads=50
>> nifi.cluster.node.event.history.size=25
>> nifi.cluster.node.connection.timeout=5 sec
>> nifi.cluster.node.read.timeout=5 sec
>> nifi.cluster.node.max.concurrent.requests=100
>> nifi.cluster.firewall.file=
>> nifi.cluster.flow.election.max.wait.time=5 mins
>> nifi.cluster.flow.election.max.candidates=3
>> 
>> # cluster load balancing properties #
>> nifi.cluster.load.balance.host= node01.domain.com 
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>> 
>> Errors in nifi-app.log
>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1] 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>  Unable to connect to node3.domain.com:8443 for load balancing
>> java.net.ConnectException: Connection refused 
>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1] 
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>  Unable to connect to node2.domain.com:8443 for load balancing
>> java.net.ConnectException: Connection refused
>> 
>> The question is:
>> Why does the error messages say: Unable to connect to node2.domain.com:8443 
>> for load balancing 
>> Shouldn't it be port 6342?
>> Why does this port bind to localhost while all other ports bind to 0.0.0.0 
>> or node01.domain.com???
>> 
>> kind regards
>> Jens


BUG??? NiFi 1.13.2 Cluster Load Balance binds to localhost and is not accessible

2021-06-10 Thread Jens M. Kofoed
Dear Community

I have installed and configured a 3 node secured NiFi cluster with NiFi
1.13.2, Java 8 on Ubuntu 20.04.
I was wondering why the cluster didn't load balance flowfiles after I
configured Round Robins between a ListFTP and FetchFTP Process. (Other
mails earlier today: Round Robin not working NiFi)
After many attempt to find and fix the issues I notes that the load balance
port 6342 was bind to localhost and not 0.0.0.0.

> netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State
tcp0  0 0.0.0.0:90900.0.0.0:*   LISTEN
tcp0  0 0.0.0.0:94430.0.0.0:*   LISTEN
tcp0  0 localhost:6342  0.0.0.0:*   LISTEN
tcp0  0 localhost:42603 0.0.0.0:*   LISTEN
tcp0  0 0.0.0.0:ssh 0.0.0.0:*   LISTEN
tcp0  0 node01.domain.com:8443 0.0.0.0:*   LISTEN
tcp6   0  0 localhost:42101 [::]:*  LISTEN
tcp6   0  0 [::]:ssh[::]:*  LISTEN
raw6   0  0 [::]:ipv6-icmp  [::]:*  7

Part of the configuration is like this:
nifi.web.https.host=node1.domain.com
nifi.web.https.port=8443
nifi.web.https.network.interface.default=ens192
nifi.cluster.is.node=true
nifi.cluster.node.address=node01.domain.com
nifi.cluster.node.protocol.port=9443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3

# cluster load balancing properties #
nifi.cluster.load.balance.host= node01.domain.com
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

Errors in nifi-app.log
2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
Unable to connect to node3.domain.com:8443 for load balancing
java.net.ConnectException: Connection refused
2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
Unable to connect to node2.domain.com:8443 for load balancing
java.net.ConnectException: Connection refused

The question is:
Why does the error messages say: Unable to connect to node2.domain.com:8443
for load balancing
Shouldn't it be port 6342?
Why does this port bind to localhost while all other ports bind to 0.0.0.0
or node01.domain.com???

kind regards
Jens


Re: Round Robin not working NiFi 1.13.2

2021-06-10 Thread Jens M. Kofoed
In the beginning both parameters was set:

nifi.cluster.is.node=true
nifi.cluster.node.address=node01.domain.com
nifi.cluster.node.protocol.port=9443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=3

# cluster load balancing properties #
nifi.cluster.load.balance.host= node01.domain.com
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

If have testet multiple combination of host names. the "funny" part is that
port 9443 binds to 0.0.0.0:
netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State
tcp0  0 0.0.0.0:90900.0.0.0:*   LISTEN
tcp0  0 0.0.0.0:94430.0.0.0:*   LISTEN
tcp0  0 localhost:6342  0.0.0.0:*   LISTEN
tcp0  0 localhost:42603 0.0.0.0:*   LISTEN
tcp0  0 0.0.0.0:ssh 0.0.0.0:*   LISTEN
tcp0  0 node01.domain.com:8443 0.0.0.0:*   LISTEN
tcp6   0  0 localhost:42101 [::]:*  LISTEN
tcp6   0  0 [::]:ssh[::]:*  LISTEN
raw6   0  0 [::]:ipv6-icmp  [::]:*  7


regards
Jens

Den tor. 10. jun. 2021 kl. 16.53 skrev Joe Gresock :

> Looking at the code, it appears that if nifi.cluster.load.balance.address
> is not set, it falls back to choosing nifi.cluster.node.address.  If this
> is not provided, it finally falls back to localhost.
>
> I'd recommend setting nifi.cluster.node.address at minimum, and you might
> as well also set nifi.cluster.load.balance.address in order to be explicit.
>
> On Thu, Jun 10, 2021 at 10:45 AM Jens M. Kofoed 
> wrote:
>
>> Hi Joe
>>
>> I just found out that port 6342 is bound to localhost. Why
>> In the last build NIFI is bound to localhost as standard if not
>> specifying which interface to use:
>> nifi.web.https.host=node1.domain.com
>> nifi.web.https.port=8443
>> nifi.web.https.network.interface.default=ens192<- If this is not
>> configured the UI is bound to localhost.
>>
>> But how can I configure port 6342 to bound to any interface???
>>
>> kind regards
>> Jens
>>
>>
>> Den tor. 10. jun. 2021 kl. 16.32 skrev Joe Gresock :
>>
>>> Ok, and just to confirm, you've verified that each node can talk to the
>>> others over port 6342?
>>>
>>> On Thu, Jun 10, 2021 at 10:29 AM Jens M. Kofoed 
>>> wrote:
>>>
>>>> I have the same error for node2 as well.
>>>> All 3 nodes can talk to each other. If I use a remote process group and
>>>> connect to an "remote" input port, everything works fine. This is a work
>>>> around for round robin.
>>>> My configuration for cluster load balance is the default.
>>>> nifi.cluster.load.balance.host=
>>>> nifi.cluster.load.balance.port=6342
>>>> nifi.cluster.load.balance.connections.per.node=4
>>>> nifi.cluster.load.balance.max.thread.count=8
>>>> nifi.cluster.load.balance.comms.timeout=30 sec
>>>>
>>>> kind regards
>>>> Jens
>>>>
>>>>
>>>> Den tor. 10. jun. 2021 kl. 16.18 skrev Joe Gresock >>> >:
>>>>
>>>>> That would seem to be the culprit :)  It sounds like your other nodes
>>>>> can't connect to node3 over port 8443.  Have you verified that the port is
>>>>> open?  Same question for all other ports configured in your 
>>>>> nifi.properties.
>>>>>
>>>>> On Thu, Jun 10, 2021 at 10:08 AM Jens M. Kofoed <
>>>>> jmkofoed@gmail.com> wrote:
>>>>>
>>>>>> Hi Joe
>>>>>>
>>>>>> Thanks for replaying :-)
>>>>>> Looking at status history for the fetchFTP and all the other
>>>>>> processers in the flow it is only the primary node which has processed
>>>>>> flowfiles.
>>>>>> I have created clusters before with no issues, but there must be
>>>>>> something tricky which I'm missing.
>>>>>>
>>>>>> I found thi

Re: Round Robin not working NiFi 1.13.2

2021-06-10 Thread Jens M. Kofoed
Hi Joe

I just found out that port 6342 is bound to localhost. Why
In the last build NIFI is bound to localhost as standard if not specifying
which interface to use:
nifi.web.https.host=node1.domain.com
nifi.web.https.port=8443
nifi.web.https.network.interface.default=ens192<- If this is not
configured the UI is bound to localhost.

But how can I configure port 6342 to bound to any interface???

kind regards
Jens


Den tor. 10. jun. 2021 kl. 16.32 skrev Joe Gresock :

> Ok, and just to confirm, you've verified that each node can talk to the
> others over port 6342?
>
> On Thu, Jun 10, 2021 at 10:29 AM Jens M. Kofoed 
> wrote:
>
>> I have the same error for node2 as well.
>> All 3 nodes can talk to each other. If I use a remote process group and
>> connect to an "remote" input port, everything works fine. This is a work
>> around for round robin.
>> My configuration for cluster load balance is the default.
>> nifi.cluster.load.balance.host=
>> nifi.cluster.load.balance.port=6342
>> nifi.cluster.load.balance.connections.per.node=4
>> nifi.cluster.load.balance.max.thread.count=8
>> nifi.cluster.load.balance.comms.timeout=30 sec
>>
>> kind regards
>> Jens
>>
>>
>> Den tor. 10. jun. 2021 kl. 16.18 skrev Joe Gresock :
>>
>>> That would seem to be the culprit :)  It sounds like your other nodes
>>> can't connect to node3 over port 8443.  Have you verified that the port is
>>> open?  Same question for all other ports configured in your nifi.properties.
>>>
>>> On Thu, Jun 10, 2021 at 10:08 AM Jens M. Kofoed 
>>> wrote:
>>>
>>>> Hi Joe
>>>>
>>>> Thanks for replaying :-)
>>>> Looking at status history for the fetchFTP and all the other processers
>>>> in the flow it is only the primary node which has processed flowfiles.
>>>> I have created clusters before with no issues, but there must be
>>>> something tricky which I'm missing.
>>>>
>>>> I found this error in the log which explain why it is only the primary
>>>> node
>>>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>>>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>>>> Unable to connect to node3.domain.com:8443 for load balancing
>>>> java.net.ConnectException: Connection refused
>>>>
>>>> But I don't know why the Connection should be refused. I can't find any
>>>> other errors about connections. And know I have added the node group into
>>>> all policies, so all nodes should have all access rights.
>>>>
>>>> Any advice for future investigation?
>>>>
>>>> kind regards
>>>> Jens
>>>>
>>>> Den tor. 10. jun. 2021 kl. 15.36 skrev Joe Gresock >>> >:
>>>>
>>>>> Hi Jens,
>>>>>
>>>>> Out of curiosity, when you run the FetchFTP processor, what does the
>>>>> Status History of that processor show?  Is the processor processing files
>>>>> on all of your nodes or just the primary?
>>>>>
>>>>> On Thu, Jun 10, 2021 at 9:07 AM Jens M. Kofoed 
>>>>> wrote:
>>>>>
>>>>>> Dear community
>>>>>>
>>>>>> I have created a 3 node cluster with NiFi 1.13.2, java 8 on a ubuntu
>>>>>> 20.04.
>>>>>> I have a ListFTP Process running on primary node only -> FetchFTP
>>>>>> with Round Robin on the connection. But if I stop the FetchFTP Process 
>>>>>> and
>>>>>> looking at the queue all flowfiles are listed to be on the same node. 
>>>>>> Which
>>>>>> is also the primary node.
>>>>>>
>>>>>> Just for testing purpose, I've tried to set round robin on other
>>>>>> connection but all files stays on primary node. I have been looking in 
>>>>>> the
>>>>>> logs but can't find any errors yet.
>>>>>>
>>>>>> Please advice?
>>>>>> kind regards
>>>>>> Jens
>>>>>>
>>>>>


Re: Round Robin not working NiFi 1.13.2

2021-06-10 Thread Jens M. Kofoed
I have the same error for node2 as well.
All 3 nodes can talk to each other. If I use a remote process group and
connect to an "remote" input port, everything works fine. This is a work
around for round robin.
My configuration for cluster load balance is the default.
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec

kind regards
Jens


Den tor. 10. jun. 2021 kl. 16.18 skrev Joe Gresock :

> That would seem to be the culprit :)  It sounds like your other nodes
> can't connect to node3 over port 8443.  Have you verified that the port is
> open?  Same question for all other ports configured in your nifi.properties.
>
> On Thu, Jun 10, 2021 at 10:08 AM Jens M. Kofoed 
> wrote:
>
>> Hi Joe
>>
>> Thanks for replaying :-)
>> Looking at status history for the fetchFTP and all the other processers
>> in the flow it is only the primary node which has processed flowfiles.
>> I have created clusters before with no issues, but there must be
>> something tricky which I'm missing.
>>
>> I found this error in the log which explain why it is only the primary
>> node
>> 2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
>> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
>> Unable to connect to node3.domain.com:8443 for load balancing
>> java.net.ConnectException: Connection refused
>>
>> But I don't know why the Connection should be refused. I can't find any
>> other errors about connections. And know I have added the node group into
>> all policies, so all nodes should have all access rights.
>>
>> Any advice for future investigation?
>>
>> kind regards
>> Jens
>>
>> Den tor. 10. jun. 2021 kl. 15.36 skrev Joe Gresock :
>>
>>> Hi Jens,
>>>
>>> Out of curiosity, when you run the FetchFTP processor, what does the
>>> Status History of that processor show?  Is the processor processing files
>>> on all of your nodes or just the primary?
>>>
>>> On Thu, Jun 10, 2021 at 9:07 AM Jens M. Kofoed 
>>> wrote:
>>>
>>>> Dear community
>>>>
>>>> I have created a 3 node cluster with NiFi 1.13.2, java 8 on a ubuntu
>>>> 20.04.
>>>> I have a ListFTP Process running on primary node only -> FetchFTP with
>>>> Round Robin on the connection. But if I stop the FetchFTP Process and
>>>> looking at the queue all flowfiles are listed to be on the same node. Which
>>>> is also the primary node.
>>>>
>>>> Just for testing purpose, I've tried to set round robin on other
>>>> connection but all files stays on primary node. I have been looking in the
>>>> logs but can't find any errors yet.
>>>>
>>>> Please advice?
>>>> kind regards
>>>> Jens
>>>>
>>>


Re: Round Robin not working NiFi 1.13.2

2021-06-10 Thread Jens M. Kofoed
Hi Joe

Thanks for replaying :-)
Looking at status history for the fetchFTP and all the other processers in
the flow it is only the primary node which has processed flowfiles.
I have created clusters before with no issues, but there must be something
tricky which I'm missing.

I found this error in the log which explain why it is only the primary node
2021-06-10 16:00:22,078 ERROR [Load-Balanced Client Thread-1]
org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient
Unable to connect to node3.domain.com:8443 for load balancing
java.net.ConnectException: Connection refused

But I don't know why the Connection should be refused. I can't find any
other errors about connections. And know I have added the node group into
all policies, so all nodes should have all access rights.

Any advice for future investigation?

kind regards
Jens

Den tor. 10. jun. 2021 kl. 15.36 skrev Joe Gresock :

> Hi Jens,
>
> Out of curiosity, when you run the FetchFTP processor, what does the
> Status History of that processor show?  Is the processor processing files
> on all of your nodes or just the primary?
>
> On Thu, Jun 10, 2021 at 9:07 AM Jens M. Kofoed 
> wrote:
>
>> Dear community
>>
>> I have created a 3 node cluster with NiFi 1.13.2, java 8 on a ubuntu
>> 20.04.
>> I have a ListFTP Process running on primary node only -> FetchFTP with
>> Round Robin on the connection. But if I stop the FetchFTP Process and
>> looking at the queue all flowfiles are listed to be on the same node. Which
>> is also the primary node.
>>
>> Just for testing purpose, I've tried to set round robin on other
>> connection but all files stays on primary node. I have been looking in the
>> logs but can't find any errors yet.
>>
>> Please advice?
>> kind regards
>> Jens
>>
>


Round Robin not working NiFi 1.13.2

2021-06-10 Thread Jens M. Kofoed
Dear community

I have created a 3 node cluster with NiFi 1.13.2, java 8 on a ubuntu 20.04.
I have a ListFTP Process running on primary node only -> FetchFTP with
Round Robin on the connection. But if I stop the FetchFTP Process and
looking at the queue all flowfiles are listed to be on the same node. Which
is also the primary node.

Just for testing purpose, I've tried to set round robin on other connection
but all files stays on primary node. I have been looking in the logs but
can't find any errors yet.

Please advice?
kind regards
Jens


Need help to insert complete content of a file into JSONB field in postgres

2021-06-09 Thread Jens M. Kofoed
Dear community

I'm struggling with inserting files into a Postgres DB. It is the whole
content of a file which has to be inserted into one field in one record.
Not each line/record in the field.
The PutSQL process expect the content of an incoming FlowFile to be the SQL
command to execute. Not data to add to the database.
I managed to use a ReplaceText Process to alter the content to include the:
INSERT INTO tablename (content)
VALUES ('$1')

where $1 is equal to the whole content. But the content has special
characters and is failing.

Please any advice how to insert all content of a file into one record/field?

kind regards
Jens


Re: Creating recursive missing folders with PutSmbFile

2021-04-20 Thread Jens M. Kofoed
Hi Mark

I'm using NiFi version 1.13.2 and I have fetched the main branch from git
version 1.14.0-SNAPSHOT.
I went into the folder nifi-nar-bundles/nifi-smb-bundle/ and executed the
command: mvn -T C2.0 clean install -Ddir-only
I copied the newly created NAR file in the nifi-smb-nar/target directory
(can't remember the full path) and saved it in
/opt/nifi/nifi-current/extension folder

After a minutes or two I added a new processor to the canvas. In the
Processor dialog window I could see 2 PutSMBFile processors. A 1.13.2
version and a 1.14.0-SNAPSHOT version. I added the  1.14.0-SNAPSHOT to the
canvas to test it. But it still doesn't create missing recursive
directories.

Now I would like to test the fix jayaaditya <https://github.com/jayaaditya>
made in https://github.com/apache/nifi/pull/4585/files fixing the reported
issue in https://issues.apache.org/jira/projects/NIFI/issues/NIFI-7863

I'm very new to git and maven, but writes C# and pythons so after trying to
figure out how to fetch the PR I manually changed the java files on my
computer and tried to make a new build. But unfortunately  it still not
working.

I didn't see any errors messages like you describe. But after updating the
nar file in the extension folder and restarting NiFi I could see
information about the 1.14.0-SNAPSHOTnar files was changes and it loaded
the new version.

So I would be very very grateful for any help. I have to distribute files
to many different Windows servers and my Windows Administrators hates to
setup new ftp services every time a new server comes in place. So the
PutSMB processor is very much needed.

Next are some minor changes to the GetSMB (
https://issues.apache.org/jira/projects/NIFI/issues/NIFI-7908) which I will
try to see if I can fix.

Kind regards
Jens M. Kofoed


Den tir. 20. apr. 2021 kl. 15.26 skrev Mark Payne :

> Jens,
>
> What version of NiFi did you deploy it to? What is the version of the
> processor that you built? My guess, given what I’ve read here is that you
> built the car and copied it over but then created an instance of the
> Processor using the old version of the NAR.
>
> Would recommend you remove the version of the NAR that was previously
> there. For example, maybe you have a 1.13.2 version and a 1.14.0-SNAPSHOT
> version that you built. Remove the 1.13.2 version from the lib/ directory
> and restart. There are other ways to handle this by changing the version of
> the processor, etc. but this is probably the easiest route if just
> verifying the changes.
>
> I’d also recommend gripping the logs for the line "Successfully created
> class loaders” and see if there are any NAR’s that were skipped. For
> example, do you see something like:
>
> Successfully created class loaders for 109 NARs, 1 were skipped
>
> That would indicate that it did not properly load the NAR, because it has
> a dependency on another NAR, and it couldn’t find that dependency.
>
> Thanks
> -Mark
>
>
>
> > On Apr 20, 2021, at 2:20 AM, Jens M. Kofoed 
> wrote:
> >
> > Hi
> >
> > I need some help/guides on how to Fetch and build the PR-4585 for
> NIFI-7863 so I can build it and test it.
> >
> > I have created a new VM with Ubuntu 20. Installed Maven, GIT and Java.
> > I used GIT to sync the git-wip-us.apache.org/repos/asf/nifi.git
> > I tried to make a full build, but 2 NAR's failed.
> >
> > I went into the folder for the SMB processors, and build the NAR for
> that specific folder which works.
> > I copied the new NAR file to my NIFI server for testing the new
> processor, but I can see it does not include the change made for the PR
> >
> > Kind regards
> > Jens M. Kofoed
> >
> >
> > Den man. 22. mar. 2021 kl. 20.03 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
> > Dear Mark
> >
> > I would love to help, testing the PR to check that it is working. My
> only problem is I'm not able to build the nar files my self. So if someone
> can build the file, I have no problems testing the file.
> >
> > About SMB:
> > I don't like to mount networks drives locally to my NiFi servers which
> is running Linux. For accessing linux servers I am using SFTP, the big
> issue are with Windows servers and those who administrate them. They like
> so use Windows shares, and use AD access rights. I have tried to setup a
> NFS share in Windows server 2019, but it was not easy to get it to work and
> diffidently not then we had to make changes. So i order to get access to
> Windows servers, I have configured MS IIS with a FTP Site.
> > To be able to use smb shares, I hope it would be much easier.
> >
> > kind regards
> > Jens M. Kofoed
> >
> >
> >
> > Den man. 22.

Re: Creating recursive missing folders with PutSmbFile

2021-04-19 Thread Jens M. Kofoed
Hi

I need some help/guides on how to Fetch and build the PR-4585 for NIFI-7863
so I can build it and test it.

I have created a new VM with Ubuntu 20. Installed Maven, GIT and Java.
I used GIT to sync the git-wip-us.apache.org/repos/asf/nifi.git
I tried to make a full build, but 2 NAR's failed.

I went into the folder for the SMB processors, and build the NAR for that
specific folder which works.
I copied the new NAR file to my NIFI server for testing the new processor,
but I can see it does not include the change made for the PR

Kind regards
Jens M. Kofoed


Den man. 22. mar. 2021 kl. 20.03 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Dear Mark
>
> I would love to help, testing the PR to check that it is working. My only
> problem is I'm not able to build the nar files my self. So if someone can
> build the file, I have no problems testing the file.
>
> About SMB:
> I don't like to mount networks drives locally to my NiFi servers which is
> running Linux. For accessing linux servers I am using SFTP, the big issue
> are with Windows servers and those who administrate them. They like so use
> Windows shares, and use AD access rights. I have tried to setup a NFS share
> in Windows server 2019, but it was not easy to get it to work and
> diffidently not then we had to make changes. So i order to get access to
> Windows servers, I have configured MS IIS with a FTP Site.
> To be able to use smb shares, I hope it would be much easier.
>
> kind regards
> Jens M. Kofoed
>
>
>
> Den man. 22. mar. 2021 kl. 19.19 skrev Joe Witt :
>
>> pretty sure SMB is super popular - it is just that for the cases we
>> typically engage in SMB isn't used as the protocol to access data :)
>>
>> Agree with the rest of that
>>
>> Thanks
>>
>> On Mon, Mar 22, 2021 at 11:13 AM Mark Payne  wrote:
>> >
>> > Jens,
>> >
>> > In order to review & merge a PR, there are two important things that
>> need to happen:
>> >
>> > 1. A NiFi committer must review the code to make sure that the changes
>> are safe, make sense, conducive with the architecture, is adhering to best
>> practices, doesn’t break automated tests, etc.
>> > 2. The code needs to be tested - typically this is accomplished both
>> manually and in an automated sense. Sometimes only manually, sometimes
>> automated.
>> >
>> > For this case, we really would need someone other than the contributor
>> who put up the PR to test this manually to verify that it works. The
>> problem is that SMB isn’t really that popular, I don’t think. So we would
>> need someone who can verify that the changes work as desired. This doesn’t
>> need to be a committer.
>> >
>> > If you’re able to build that branch and verify the changes and then
>> report back any positive or negative findings, that can go a long way to
>> help in the review process.
>> >
>> > Thanks
>> > -Mark
>> >
>> > On Mar 22, 2021, at 4:03 AM, Jens M. Kofoed 
>> wrote:
>> >
>> > Hi
>> >
>> > The following JIRA: https://issues.apache.org/jira/browse/NIFI-7863,
>> was created October 1, 2020 and the user Jaya has created a PR October 9,
>> 2020. but nothing have happens since.
>> >
>> > Are there someone in the community which is able to help implement a
>> fix?
>> > We had looked forward to see the fix included in 1.13, but
>> unfortunately it is not.
>> >
>> > Kind regards
>> > Jens M. Kofoed
>> >
>> >
>> >
>> >
>>
>


Daylightsaving issue with ListFTP and PutFTP Processor

2021-04-06 Thread Jens M. Kofoed
Hi all

Last week I notes a strange behavior with ListFTP.
The ListFTP is configured to use the "Tracking entity" strategy, and is
listing a folder where we are not allowed to delete/remove files after they
are fetched. Doing the night where daylight saving change the time on the
servers, ListFTP listed ALL files once again.
I don't know if the Tracking entity is using the local timestamp as one of
the tracking parameters instead of UTC time. But if it does it explain why
ListFTP sees all files as new. If this is the case I will consider this as
a bug.

Another issues happened with both  List- and PutFTP.
The processor came with a NPE error with no further explanation. I found
out that ListFTP didn't like files which was created in the hour where
daylight-saving change time. If I removed the files last modified in the
time window, the processor start listed files again. As soon I moved a file
into the folder with a timestamp within the daylight-saving time window the
ListFTP processor produce a NPE.

PutFTP was not able to overwrite an existing file which was previous saved
in the daylight-saving time window. If I "touched" the existing file to
update the lastmodified time, PutFTP was happy again.

I will try to see if can reproduce the issues.

Kind regards
Jens M. Kofoed


Re: Creating recursive missing folders with PutSmbFile

2021-03-22 Thread Jens M. Kofoed
Dear Mark

I would love to help, testing the PR to check that it is working. My only
problem is I'm not able to build the nar files my self. So if someone can
build the file, I have no problems testing the file.

About SMB:
I don't like to mount networks drives locally to my NiFi servers which is
running Linux. For accessing linux servers I am using SFTP, the big issue
are with Windows servers and those who administrate them. They like so use
Windows shares, and use AD access rights. I have tried to setup a NFS share
in Windows server 2019, but it was not easy to get it to work and
diffidently not then we had to make changes. So i order to get access to
Windows servers, I have configured MS IIS with a FTP Site.
To be able to use smb shares, I hope it would be much easier.

kind regards
Jens M. Kofoed



Den man. 22. mar. 2021 kl. 19.19 skrev Joe Witt :

> pretty sure SMB is super popular - it is just that for the cases we
> typically engage in SMB isn't used as the protocol to access data :)
>
> Agree with the rest of that
>
> Thanks
>
> On Mon, Mar 22, 2021 at 11:13 AM Mark Payne  wrote:
> >
> > Jens,
> >
> > In order to review & merge a PR, there are two important things that
> need to happen:
> >
> > 1. A NiFi committer must review the code to make sure that the changes
> are safe, make sense, conducive with the architecture, is adhering to best
> practices, doesn’t break automated tests, etc.
> > 2. The code needs to be tested - typically this is accomplished both
> manually and in an automated sense. Sometimes only manually, sometimes
> automated.
> >
> > For this case, we really would need someone other than the contributor
> who put up the PR to test this manually to verify that it works. The
> problem is that SMB isn’t really that popular, I don’t think. So we would
> need someone who can verify that the changes work as desired. This doesn’t
> need to be a committer.
> >
> > If you’re able to build that branch and verify the changes and then
> report back any positive or negative findings, that can go a long way to
> help in the review process.
> >
> > Thanks
> > -Mark
> >
> > On Mar 22, 2021, at 4:03 AM, Jens M. Kofoed 
> wrote:
> >
> > Hi
> >
> > The following JIRA: https://issues.apache.org/jira/browse/NIFI-7863,
> was created October 1, 2020 and the user Jaya has created a PR October 9,
> 2020. but nothing have happens since.
> >
> > Are there someone in the community which is able to help implement a fix?
> > We had looked forward to see the fix included in 1.13, but unfortunately
> it is not.
> >
> > Kind regards
> > Jens M. Kofoed
> >
> >
> >
> >
>


Creating recursive missing folders with PutSmbFile

2021-03-22 Thread Jens M. Kofoed
Hi

The following JIRA: https://issues.apache.org/jira/browse/NIFI-7863, was
created October 1, 2020 and the user Jaya has created a PR October 9, 2020.
but nothing have happens since.

Are there someone in the community which is able to help implement a fix?
We had looked forward to see the fix included in 1.13, but unfortunately it
is not.

Kind regards
Jens M. Kofoed


Re: Count records where value match x

2021-03-03 Thread Jens M. Kofoed
Hi Mark

Thanks for pointing me in the direction of where clauses inside a count. I
found this which is working.
select count(case when queuedCountProcent > 50 then 1 else null end) as
count_queuedCountProcent,
  count(case when queuedBytesProcent > 50 then 1 else null end) as
count_queuedBytesProcent,
  count(case when isBackPressureEnabled = 'true' then 1 else null
end) as count_isBackPressureEnabled
 FROM FLOWFILE

kind regards
Jens

Den ons. 3. mar. 2021 kl. 06.44 skrev Jens M. Kofoed :

> Hi Mark
>
> Many thanks for you replay. I have never used a "where" clauses inside a
> count in a select statement so I tried it. But it doesn't work all 4 fields
> has the same value which is equal to the total amount of records.
> if it had worked it would have been very very nice.
>
> Thanks
> Jens M. Kofoed
>
> Den tir. 2. mar. 2021 kl. 16.04 skrev Mark Payne :
>
>> Jens,
>>
>> I think you should be able to use a query like:
>>
>> SELECT
>> COUNT( (queuedBytes*100/backPressureBytesThreshold) > 60) AS
>> overSixtyPercentBytes,
>> COUNT( (queuedCount*100/backPressureObjectThreshold) > 60) AS
>> overSixtyPercentObjects,
>> COUNT( (queuedBytes*100/backPressureBytesThreshold) > 75) AS
>> overSeventyFivePercentBytes,
>> COUNT( (queuedCount*100/backPressureObjectThreshold) > 75) AS
>> overSeventyFivePercentObjects
>> FROM FLOWFILE
>>
>> Unless I’m misunderstanding what you’re after?
>>
>> Thanks
>> -Mark
>>
>> > On Mar 2, 2021, at 2:40 AM, Jens M. Kofoed 
>> wrote:
>> >
>> > Hi
>> >
>> > I'm using the SiteToSiteStatusReportingTask to monitor NIFI flows and I
>> would like to calculate how many connections has BackPressure Enabled and a
>> bunch more calculations. The end results of all the statistics has to be in
>> one final flowfile so a final result can be sent to another system.
>> >
>> > First I use a PartitionRecord, to split the records by componentType so
>> I gets all Connections in one flowfile. Using a CalculateRecordStats with a
>> property with the value of "/isBackPressureEnabled" will give 2 resuts:
>> > recordStats.isBackPressureEnabled.false = 2451
>> > recordStats.isBackPressureEnabled.true = 2
>> >
>> > But I can't figure out how to make calculations based on specific
>> fields where values has to match a criteria.
>> > Using a QueryRecords I have added 2 more fields to the records.
>> > SELECT *, (queuedBytes*100/backPressureBytesThreshold) AS
>> queuedBytesProcent, (queuedCount*100/backPressureObjectThreshold) AS
>> queuedCountProcent FROM FLOWFILE
>> >
>> > I would like to calculate how many connections has a queuedBytesProcent
>> > 60 and how many has a  queuedCountProcent > 75.
>> > And later on I would like to make calculation based on different
>> "parentPath".
>> >
>> > I can use the QueryRecords to make multiple outputs depending of
>> different statements and next use a CalculateRecordStats to calculate the
>> result. But here I will end up with multiple flowfiles and not one final
>> result.
>> >
>> > Is it possible to do it in one straight flow?
>> >
>> > kind regards
>> > Jens M. Kofoed
>>
>>


PartitionsRecord based on part of a string

2021-03-02 Thread Jens M. Kofoed
Hi

>From the JSON document getting from the SiteToSiteStatusReportingTask, I
use a PartitionRecord to split the records by componentType. Next I would
like to split connections based on 2nd level of the parentPath.
eg: "parentPath":"NiFi Flow / main groups / subgrups / another sub" ->
NiFi Flow / main groups/*

My issue is that it is a string field and not an array. If it was an array
I could use the  PartitionRecord. But since it is a string which is "space
forward slash  space" separated, I'm stuck and can't figured it out how to
configure the PartitionRecord

Please help

Kind regards
Jens M. Kofoed


Re: Count records where value match x

2021-03-02 Thread Jens M. Kofoed
Hi Mark

Many thanks for you replay. I have never used a "where" clauses inside a
count in a select statement so I tried it. But it doesn't work all 4 fields
has the same value which is equal to the total amount of records.
if it had worked it would have been very very nice.

Thanks
Jens M. Kofoed

Den tir. 2. mar. 2021 kl. 16.04 skrev Mark Payne :

> Jens,
>
> I think you should be able to use a query like:
>
> SELECT
> COUNT( (queuedBytes*100/backPressureBytesThreshold) > 60) AS
> overSixtyPercentBytes,
> COUNT( (queuedCount*100/backPressureObjectThreshold) > 60) AS
> overSixtyPercentObjects,
> COUNT( (queuedBytes*100/backPressureBytesThreshold) > 75) AS
> overSeventyFivePercentBytes,
> COUNT( (queuedCount*100/backPressureObjectThreshold) > 75) AS
> overSeventyFivePercentObjects
> FROM FLOWFILE
>
> Unless I’m misunderstanding what you’re after?
>
> Thanks
> -Mark
>
> > On Mar 2, 2021, at 2:40 AM, Jens M. Kofoed 
> wrote:
> >
> > Hi
> >
> > I'm using the SiteToSiteStatusReportingTask to monitor NIFI flows and I
> would like to calculate how many connections has BackPressure Enabled and a
> bunch more calculations. The end results of all the statistics has to be in
> one final flowfile so a final result can be sent to another system.
> >
> > First I use a PartitionRecord, to split the records by componentType so
> I gets all Connections in one flowfile. Using a CalculateRecordStats with a
> property with the value of "/isBackPressureEnabled" will give 2 resuts:
> > recordStats.isBackPressureEnabled.false = 2451
> > recordStats.isBackPressureEnabled.true = 2
> >
> > But I can't figure out how to make calculations based on specific fields
> where values has to match a criteria.
> > Using a QueryRecords I have added 2 more fields to the records.
> > SELECT *, (queuedBytes*100/backPressureBytesThreshold) AS
> queuedBytesProcent, (queuedCount*100/backPressureObjectThreshold) AS
> queuedCountProcent FROM FLOWFILE
> >
> > I would like to calculate how many connections has a queuedBytesProcent
> > 60 and how many has a  queuedCountProcent > 75.
> > And later on I would like to make calculation based on different
> "parentPath".
> >
> > I can use the QueryRecords to make multiple outputs depending of
> different statements and next use a CalculateRecordStats to calculate the
> result. But here I will end up with multiple flowfiles and not one final
> result.
> >
> > Is it possible to do it in one straight flow?
> >
> > kind regards
> > Jens M. Kofoed
>
>


Count records where value match x

2021-03-01 Thread Jens M. Kofoed
Hi

I'm using the SiteToSiteStatusReportingTask to monitor NIFI flows and I
would like to calculate how many connections has BackPressure Enabled and a
bunch more calculations. The end results of all the statistics has to be in
one final flowfile so a final result can be sent to another system.

First I use a PartitionRecord, to split the records by componentType so I
gets all Connections in one flowfile. Using a CalculateRecordStats with a
property with the value of "/isBackPressureEnabled" will give 2 resuts:
recordStats.isBackPressureEnabled.false = 2451
recordStats.isBackPressureEnabled.true = 2

But I can't figure out how to make calculations based on specific fields
where values has to match a criteria.
Using a QueryRecords I have added 2 more fields to the records.
SELECT *, (queuedBytes*100/backPressureBytesThreshold) AS
queuedBytesProcent, (queuedCount*100/backPressureObjectThreshold) AS
queuedCountProcent FROM FLOWFILE

I would like to calculate how many connections has a queuedBytesProcent >
60 and how many has a  queuedCountProcent > 75.
And later on I would like to make calculation based on different
"parentPath".

I can use the QueryRecords to make multiple outputs depending of different
statements and next use a CalculateRecordStats to calculate the result. But
here I will end up with multiple flowfiles and not one final result.

Is it possible to do it in one straight flow?

kind regards
Jens M. Kofoed


Re: Issue with QueryRecord failing when data is missing

2021-02-25 Thread Jens M. Kofoed
Hi Matt

Many thanks for you replay. Yes I use the default which is the
infer-schema. And it makes sense that the infer-schema can't guess what
type the fields are if there are no data.
So if I search for a boolean value and the infer-schema set it to text, it
makes sense it produce errors. And why did I not think of it when I sat and
tore my hair out of my head :-)

As a workaround I have used a ValidateRecord and a RouteOnAttribute using
the record.count.
So many thanks for the answer.

Kind regards
Jens M. Kofoed

Den tor. 25. feb. 2021 kl. 16.28 skrev Matt Burgess :

> Jens,
>
> What is the Schema Access Strategy set to in your CSVReader? If "Infer
> Schema" or "Use String Fields From Header", the setting of "Treat
> First Line As Header" should be ignored as those two options require a
> header be present anyway. If you know the schema ahead of time you
> could set it in the CSVReader rather than inferring it.
>
> For "Infer Schema", there's a bug where the inferred schema is empty
> because we don't have any records from which to infer the types of the
> fields (even though the field names are present). I wrote up NIFI-8259
> [1] to infer the types as strings when no records are present.
>
> As a workaround you could filter out any FlowFiles that have no
> records, either by using CountText or the 'record.count' attribute if
> it has been set, into a RouteOnAttribute. Alternatively you could
> emulate what NIFI-8259 is going to do by using "Use String Fields From
> Header" in your CSVReader, but in that case you might need a CAST(colC
> as BOOLEAN) in your SQL since populated FlowFiles could have the
> correctly inferred schema where empty FlowFiles (or if "Use String
> Fields From Header" is set) will think colC is a string rather than a
> boolean. The CAST should work in both cases but I didn't try it.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-8259
>
> On Thu, Feb 25, 2021 at 1:56 AM Jens M. Kofoed 
> wrote:
> >
> > Hi all
> >
> > I have a issue with using the QueryRecord query csv files. currently i'm
> running NiFi version 1.12.1 but I also tested this in version 1.13.0
> > If my incoming csv file only have a header line and no data it fails
> >
> > My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE
> colC = 'true'
> >
> > Changes made to the CSVReader:
> > Treat Firs Line as Header = true
> >
> > Changes made to the CSVRecordSetWriter:
> > Include Header Line = false
> > Record Separator = ,
> >
> > Here are 2 sample data. The first one works as expected, but sample 2
> gives errors
> > Sample 1:
> > colA,colB,colC
> > data1A,data1B,true
> > data2A,data2B,false
> > data3A,data3B,true
> >
> > Outcome: data1A,data3A,
> >
> > Sample 2:
> > colA,colB,colC
> >
> > Error message:
> > QueryRecord[id=d7c38f75-0177-1000--f694dd96] Unable to query
> StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1614232293848-3,
> container=Node01Cont01, section=3], offset=463,
> length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due
> to java.sql.SQLException: Error while preparing statement [SELECT colA FROM
> FLOWFILE WHERE colC = true]:
> org.apache.nifi.processor.exception.ProcessException:
> java.sql.SQLException: Error while preparing statement [SELECT colA FROM
> FLOWFILE WHERE colC = true]
> >
> > Is this a bug?
> >
> > kind regards
> > Jens M. Kofoed
>


Why is totalSizeCap not a default parameter in the logback.xml file

2021-02-24 Thread Jens M. Kofoed
Hi

We have unfortunately had an incident where NiFi doing a weekend is filling
up the disk with logs because a process is failing and produce hundreds of
error messages per seconds.
We have changed the rollingPolicy to use daily rollover with a maxHistory
of 30 and maxFileSize at 100MB.
When the daily logfile is going to be bigger than maxFileSize the
rollingpolicy will create incrementing "subfiles" as .# for that day. But
the maxHistory does not count for subfiles.
So with a process producing hundreds of error messages per seconds you can
have a situation where you will end up with thousands of subfiles for each
day filling up the disk.

There are an attribute called "totalSizeCap" which has been asked for in
JIRA:
https://issues.apache.org/jira/browse/NIFI-2203
https://issues.apache.org/jira/browse/NIFI-4315

This attribute is already working, but is by default not included in the
logback.xml file nor in the new stateless-logback.xml file.

Example:


${org.apache.nifi.bootstrap.config.log.dir}/nifi-stateless.log



${org.apache.nifi.bootstrap.config.log.dir}/nifi-stateless_%d{-MM-dd_HH}.%i.log
100MB

30


*10GB*
true

%date %level [%thread] %logger{40} %msg%n



Please add this as a new default parameter

kind regards
Jens M. Kofoed


Issue with QueryRecord failing when data is missing

2021-02-24 Thread Jens M. Kofoed
Hi all

I have a issue with using the QueryRecord query csv files. currently i'm
running NiFi version 1.12.1 but I also tested this in version 1.13.0
If my incoming csv file only have a header line and no data it fails

My querying statement looks like this: SELECT colA FROM FLOWFILE WHERE colC
= 'true'

Changes made to the CSVReader:
Treat Firs Line as Header = true

Changes made to the CSVRecordSetWriter:
Include Header Line = false
Record Separator = ,

Here are 2 sample data. The first one works as expected, but sample 2 gives
errors
Sample 1:
colA,colB,colC
data1A,data1B,true
data2A,data2B,false
data3A,data3B,true

Outcome: data1A,data3A,

Sample 2:
colA,colB,colC

Error message:
QueryRecord[id=d7c38f75-0177-1000--f694dd96] Unable to query
StandardFlowFileRecord[uuid=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1614232293848-3,
container=Node01Cont01, section=3], offset=463,
length=14],offset=0,name=74a71c6e-3d3f-406c-92af-c9e4e27d6d69,size=14] due
to java.sql.SQLException: Error while preparing statement [SELECT colA FROM
FLOWFILE WHERE colC = true]:
org.apache.nifi.processor.exception.ProcessException:
java.sql.SQLException: Error while preparing statement [SELECT colA FROM
FLOWFILE WHERE colC = true]

Is this a bug?

kind regards
Jens M. Kofoed


Re: Issue with GetFTP and arabic letters?

2021-02-23 Thread Jens M. Kofoed
Sound really good. 
Many thanks 🙏 

Kind regards 
Jens 

> Den 23. feb. 2021 kl. 13.42 skrev Pierre Villard 
> :
> 
> Submitted a pull request and confirmed it worked with arabic characters 
> against a public server I was able to test GFF -> PutFTP as well as ListFTP 
> -> FetchFTP.
> We may have a 1.13.1 release coming soon so it would likely be included.
> 
> Thanks,
> Pierre
> 
>> Le mar. 23 févr. 2021 à 15:30, Jens M. Kofoed  a 
>> écrit :
>> Jira bug created: https://issues.apache.org/jira/browse/NIFI-8220
>> 
>> kind regards
>> Jens M. Kofoed
>> 
>>> Den tir. 23. feb. 2021 kl. 12.28 skrev Jens M. Kofoed 
>>> :
>>> no problem,
>>> 
>>> By the way I just tried PutFTP and it also have some issue.
>>> I created a flowfile, used an UpdateAttribute to set the filename to:  
>>> امتحان.txt   used a PutFTP (with utf8 enabled) and the filename at the ftp 
>>> server become: امتحان.txt
>>> 
>>>> Den tir. 23. feb. 2021 kl. 12.20 skrev Pierre Villard 
>>>> :
>>>> Do you mind filing a JIRA? I had a quick look and the fix should be fairly 
>>>> straightforward. Will give it a try to confirm.
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> Le mar. 23 févr. 2021 à 15:09, Jens M. Kofoed  a 
>>>>> écrit :
>>>>> Hi
>>>>> 
>>>>> I have now tried with NiFi version 1.13.0, and there are still issues 
>>>>> with non-English characters.
>>>>> The ListFTP process does not have an attribute "Use UTF-8 Encoding" but 
>>>>> FetchFTP has.
>>>>> I have created a file with this filename: امتحان.txt
>>>>> ListFTP list the filename as: ??.txt which does not exist and 
>>>>> therefore FetchFTP fails.
>>>>> 
>>>>> I tried to use an UpdateAttribute process to change the filename into  
>>>>> امتحان.txt but FetchFTP give my an error again
>>>>> FetchFTP[id=ce85ca8e-0177-1000--95e1e5f7] Failed to fetch content 
>>>>> for 
>>>>> StandardFlowFileRecord[uuid=823c600d-c7fa-406e-ae66-b1ac70f5e212,claim=,offset=0,name=امتحان.txt,size=0]
>>>>>  from filename FTP/امتحان.txt on remote host 192.168.1.2:21 due to 
>>>>> java.io.IOException: 550 The system cannot find the file specified. 
>>>>> ; routing to comms.failure: java.io.IOException: 550 The system cannot 
>>>>> find the file specified.
>>>>> 
>>>>> Kind regards
>>>>> Jens
>>>>> 
>>>>>> Den tir. 23. feb. 2021 kl. 09.14 skrev Pierre Villard 
>>>>>> :
>>>>>> Let us know if this still doesn't work, we will definitely want to look 
>>>>>> into it!
>>>>>> 
>>>>>>> Le mar. 23 févr. 2021 à 11:47, Jens M. Kofoed  
>>>>>>> a écrit :
>>>>>>> Hi Pierre
>>>>>>> 
>>>>>>> Many thanks, I will definitely try the new version 1.13.0. I have 
>>>>>>> already tried the List+Fetch processers in 1.12.1. And these 2 
>>>>>>> processors can take some non-English characters like Danish and Swedish 
>>>>>>> character which GetFTP can't. But they still can't read my Arabic 
>>>>>>> characters.
>>>>>>> I will try 1.13.0
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jens  
>>>>>>> 
>>>>>>>> Den tir. 23. feb. 2021 kl. 08.17 skrev Pierre Villard 
>>>>>>>> :
>>>>>>>> Hi Jens,
>>>>>>>> 
>>>>>>>> You may want to try NiFi 1.13.0 with List+Fetch processors. The 
>>>>>>>> addition of https://issues.apache.org/jira/browse/NIFI-7685 may help. 
>>>>>>>> Worth a try at least.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>> Le mar. 23 févr. 2021 à 10:56, Jens M. Kofoed 
>>>>>>>>>  a écrit :
>>>>>>>>> Dear community
>>>>>>>>> 
>>>>>>>>> I have an issue which I hope some of you might be able to help me 
>>>>>>>>> with or point me in the right direction.
>>>>>>>>> We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect 
>>>>>>>>> with FileZilla I have no problems with uploading and downloading 
>>>>>>>>> files with Arabic letters in the filename.
>>>>>>>>> But then NiFi (v. 1.12.1) connect with a GetFTP process it list the 
>>>>>>>>> filenames as  instead of the Arabic letters, and is not able to 
>>>>>>>>> get the file since that filename does not exist. The GetFTP process 
>>>>>>>>> is also configured with utf-8 enabled.
>>>>>>>>> The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi 
>>>>>>>>> server to the FTP server I see the same behavior, Arabic letters is 
>>>>>>>>> showing as ???
>>>>>>>>> 
>>>>>>>>> But since FileZilla can see Arabic letters correctly, I can't see 
>>>>>>>>> that the issue is at the FTP server.
>>>>>>>>> So please help
>>>>>>>>> 
>>>>>>>>> Kind regards
>>>>>>>>> Jens M. Kofoed
>>>>>>>>> 


Re: Issue with GetFTP and arabic letters?

2021-02-23 Thread Jens M. Kofoed
Jira bug created: https://issues.apache.org/jira/browse/NIFI-8220

kind regards
Jens M. Kofoed

Den tir. 23. feb. 2021 kl. 12.28 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> no problem,
>
> By the way I just tried PutFTP and it also have some issue.
> I created a flowfile, used an UpdateAttribute to set the filename to:
> امتحان.txt   used a PutFTP (with utf8 enabled) and the filename at the ftp
> server become: امتحان.txt
>
> Den tir. 23. feb. 2021 kl. 12.20 skrev Pierre Villard <
> pierre.villard...@gmail.com>:
>
>> Do you mind filing a JIRA? I had a quick look and the fix should be
>> fairly straightforward. Will give it a try to confirm.
>>
>> Thanks,
>> Pierre
>>
>> Le mar. 23 févr. 2021 à 15:09, Jens M. Kofoed  a
>> écrit :
>>
>>> Hi
>>>
>>> I have now tried with NiFi version 1.13.0, and there are still issues
>>> with non-English characters.
>>> The ListFTP process does not have an attribute "Use UTF-8 Encoding" but
>>> FetchFTP has.
>>> I have created a file with this filename: امتحان.txt
>>> ListFTP list the filename as: ??.txt which does not exist and
>>> therefore FetchFTP fails.
>>>
>>> I tried to use an UpdateAttribute process to change the filename into
>>> امتحان.txt but FetchFTP give my an error again
>>>
>>> FetchFTP[id=ce85ca8e-0177-1000--95e1e5f7] Failed to fetch content 
>>> for 
>>> StandardFlowFileRecord[uuid=823c600d-c7fa-406e-ae66-b1ac70f5e212,claim=,offset=0,name=امتحان.txt,size=0]
>>>  from filename FTP/امتحان.txt on remote host 192.168.1.2:21 due to 
>>> java.io.IOException: 550 The system cannot find the file specified.
>>> ; routing to comms.failure: java.io.IOException: 550 The system cannot find 
>>> the file specified.
>>>
>>>
>>> Kind regards
>>> Jens
>>>
>>> Den tir. 23. feb. 2021 kl. 09.14 skrev Pierre Villard <
>>> pierre.villard...@gmail.com>:
>>>
>>>> Let us know if this still doesn't work, we will definitely want to look
>>>> into it!
>>>>
>>>> Le mar. 23 févr. 2021 à 11:47, Jens M. Kofoed 
>>>> a écrit :
>>>>
>>>>> Hi Pierre
>>>>>
>>>>> Many thanks, I will definitely try the new version 1.13.0. I have
>>>>> already tried the List+Fetch processers in 1.12.1. And these 2 processors
>>>>> can take some non-English characters like Danish and Swedish character
>>>>> which GetFTP can't. But they still can't read my Arabic characters.
>>>>> I will try 1.13.0
>>>>>
>>>>> Thanks,
>>>>> Jens
>>>>>
>>>>> Den tir. 23. feb. 2021 kl. 08.17 skrev Pierre Villard <
>>>>> pierre.villard...@gmail.com>:
>>>>>
>>>>>> Hi Jens,
>>>>>>
>>>>>> You may want to try NiFi 1.13.0 with List+Fetch processors. The
>>>>>> addition of https://issues.apache.org/jira/browse/NIFI-7685 may
>>>>>> help. Worth a try at least.
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>
>>>>>> Le mar. 23 févr. 2021 à 10:56, Jens M. Kofoed 
>>>>>> a écrit :
>>>>>>
>>>>>>> Dear community
>>>>>>>
>>>>>>> I have an issue which I hope some of you might be able to help me
>>>>>>> with or point me in the right direction.
>>>>>>> We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect
>>>>>>> with FileZilla I have no problems with uploading and downloading files 
>>>>>>> with
>>>>>>> Arabic letters in the filename.
>>>>>>> But then NiFi (v. 1.12.1) connect with a GetFTP process it list the
>>>>>>> filenames as  instead of the Arabic letters, and is not able to get 
>>>>>>> the
>>>>>>> file since that filename does not exist. The GetFTP process is also
>>>>>>> configured with utf-8 enabled.
>>>>>>> The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi
>>>>>>> server to the FTP server I see the same behavior, Arabic letters is 
>>>>>>> showing
>>>>>>> as ???
>>>>>>>
>>>>>>> But since FileZilla can see Arabic letters correctly, I can't see
>>>>>>> that the issue is at the FTP server.
>>>>>>> So please help
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Jens M. Kofoed
>>>>>>>
>>>>>>>


Re: Issue with GetFTP and arabic letters?

2021-02-23 Thread Jens M. Kofoed
no problem,

By the way I just tried PutFTP and it also have some issue.
I created a flowfile, used an UpdateAttribute to set the filename to:
امتحان.txt   used a PutFTP (with utf8 enabled) and the filename at the ftp
server become: امتحان.txt

Den tir. 23. feb. 2021 kl. 12.20 skrev Pierre Villard <
pierre.villard...@gmail.com>:

> Do you mind filing a JIRA? I had a quick look and the fix should be fairly
> straightforward. Will give it a try to confirm.
>
> Thanks,
> Pierre
>
> Le mar. 23 févr. 2021 à 15:09, Jens M. Kofoed  a
> écrit :
>
>> Hi
>>
>> I have now tried with NiFi version 1.13.0, and there are still issues
>> with non-English characters.
>> The ListFTP process does not have an attribute "Use UTF-8 Encoding" but
>> FetchFTP has.
>> I have created a file with this filename: امتحان.txt
>> ListFTP list the filename as: ??.txt which does not exist and
>> therefore FetchFTP fails.
>>
>> I tried to use an UpdateAttribute process to change the filename into
>> امتحان.txt but FetchFTP give my an error again
>>
>> FetchFTP[id=ce85ca8e-0177-1000--95e1e5f7] Failed to fetch content 
>> for 
>> StandardFlowFileRecord[uuid=823c600d-c7fa-406e-ae66-b1ac70f5e212,claim=,offset=0,name=امتحان.txt,size=0]
>>  from filename FTP/امتحان.txt on remote host 192.168.1.2:21 due to 
>> java.io.IOException: 550 The system cannot find the file specified.
>> ; routing to comms.failure: java.io.IOException: 550 The system cannot find 
>> the file specified.
>>
>>
>> Kind regards
>> Jens
>>
>> Den tir. 23. feb. 2021 kl. 09.14 skrev Pierre Villard <
>> pierre.villard...@gmail.com>:
>>
>>> Let us know if this still doesn't work, we will definitely want to look
>>> into it!
>>>
>>> Le mar. 23 févr. 2021 à 11:47, Jens M. Kofoed 
>>> a écrit :
>>>
>>>> Hi Pierre
>>>>
>>>> Many thanks, I will definitely try the new version 1.13.0. I have
>>>> already tried the List+Fetch processers in 1.12.1. And these 2 processors
>>>> can take some non-English characters like Danish and Swedish character
>>>> which GetFTP can't. But they still can't read my Arabic characters.
>>>> I will try 1.13.0
>>>>
>>>> Thanks,
>>>> Jens
>>>>
>>>> Den tir. 23. feb. 2021 kl. 08.17 skrev Pierre Villard <
>>>> pierre.villard...@gmail.com>:
>>>>
>>>>> Hi Jens,
>>>>>
>>>>> You may want to try NiFi 1.13.0 with List+Fetch processors. The
>>>>> addition of https://issues.apache.org/jira/browse/NIFI-7685 may help.
>>>>> Worth a try at least.
>>>>>
>>>>> Thanks,
>>>>> Pierre
>>>>>
>>>>> Le mar. 23 févr. 2021 à 10:56, Jens M. Kofoed 
>>>>> a écrit :
>>>>>
>>>>>> Dear community
>>>>>>
>>>>>> I have an issue which I hope some of you might be able to help me
>>>>>> with or point me in the right direction.
>>>>>> We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect
>>>>>> with FileZilla I have no problems with uploading and downloading files 
>>>>>> with
>>>>>> Arabic letters in the filename.
>>>>>> But then NiFi (v. 1.12.1) connect with a GetFTP process it list the
>>>>>> filenames as  instead of the Arabic letters, and is not able to get 
>>>>>> the
>>>>>> file since that filename does not exist. The GetFTP process is also
>>>>>> configured with utf-8 enabled.
>>>>>> The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi
>>>>>> server to the FTP server I see the same behavior, Arabic letters is 
>>>>>> showing
>>>>>> as ???
>>>>>>
>>>>>> But since FileZilla can see Arabic letters correctly, I can't see
>>>>>> that the issue is at the FTP server.
>>>>>> So please help
>>>>>>
>>>>>> Kind regards
>>>>>> Jens M. Kofoed
>>>>>>
>>>>>>


Re: Issue with GetFTP and arabic letters?

2021-02-23 Thread Jens M. Kofoed
Hi

I have now tried with NiFi version 1.13.0, and there are still issues with
non-English characters.
The ListFTP process does not have an attribute "Use UTF-8 Encoding" but
FetchFTP has.
I have created a file with this filename: امتحان.txt
ListFTP list the filename as: ??.txt which does not exist and therefore
FetchFTP fails.

I tried to use an UpdateAttribute process to change the filename into
امتحان.txt but FetchFTP give my an error again

FetchFTP[id=ce85ca8e-0177-1000--95e1e5f7] Failed to fetch
content for 
StandardFlowFileRecord[uuid=823c600d-c7fa-406e-ae66-b1ac70f5e212,claim=,offset=0,name=امتحان.txt,size=0]
from filename FTP/امتحان.txt on remote host 192.168.1.2:21 due to
java.io.IOException: 550 The system cannot find the file specified.
; routing to comms.failure: java.io.IOException: 550 The system cannot
find the file specified.


Kind regards
Jens

Den tir. 23. feb. 2021 kl. 09.14 skrev Pierre Villard <
pierre.villard...@gmail.com>:

> Let us know if this still doesn't work, we will definitely want to look
> into it!
>
> Le mar. 23 févr. 2021 à 11:47, Jens M. Kofoed  a
> écrit :
>
>> Hi Pierre
>>
>> Many thanks, I will definitely try the new version 1.13.0. I have already
>> tried the List+Fetch processers in 1.12.1. And these 2 processors can take
>> some non-English characters like Danish and Swedish character which GetFTP
>> can't. But they still can't read my Arabic characters.
>> I will try 1.13.0
>>
>> Thanks,
>> Jens
>>
>> Den tir. 23. feb. 2021 kl. 08.17 skrev Pierre Villard <
>> pierre.villard...@gmail.com>:
>>
>>> Hi Jens,
>>>
>>> You may want to try NiFi 1.13.0 with List+Fetch processors. The addition
>>> of https://issues.apache.org/jira/browse/NIFI-7685 may help. Worth a
>>> try at least.
>>>
>>> Thanks,
>>> Pierre
>>>
>>> Le mar. 23 févr. 2021 à 10:56, Jens M. Kofoed 
>>> a écrit :
>>>
>>>> Dear community
>>>>
>>>> I have an issue which I hope some of you might be able to help me with
>>>> or point me in the right direction.
>>>> We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect
>>>> with FileZilla I have no problems with uploading and downloading files with
>>>> Arabic letters in the filename.
>>>> But then NiFi (v. 1.12.1) connect with a GetFTP process it list the
>>>> filenames as  instead of the Arabic letters, and is not able to get the
>>>> file since that filename does not exist. The GetFTP process is also
>>>> configured with utf-8 enabled.
>>>> The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi
>>>> server to the FTP server I see the same behavior, Arabic letters is showing
>>>> as ???
>>>>
>>>> But since FileZilla can see Arabic letters correctly, I can't see that
>>>> the issue is at the FTP server.
>>>> So please help
>>>>
>>>> Kind regards
>>>> Jens M. Kofoed
>>>>
>>>>


Re: Issue with GetFTP and arabic letters?

2021-02-22 Thread Jens M. Kofoed
Hi Pierre

Many thanks, I will definitely try the new version 1.13.0. I have already
tried the List+Fetch processers in 1.12.1. And these 2 processors can take
some non-English characters like Danish and Swedish character which GetFTP
can't. But they still can't read my Arabic characters.
I will try 1.13.0

Thanks,
Jens

Den tir. 23. feb. 2021 kl. 08.17 skrev Pierre Villard <
pierre.villard...@gmail.com>:

> Hi Jens,
>
> You may want to try NiFi 1.13.0 with List+Fetch processors. The addition
> of https://issues.apache.org/jira/browse/NIFI-7685 may help. Worth a try
> at least.
>
> Thanks,
> Pierre
>
> Le mar. 23 févr. 2021 à 10:56, Jens M. Kofoed  a
> écrit :
>
>> Dear community
>>
>> I have an issue which I hope some of you might be able to help me with or
>> point me in the right direction.
>> We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect with
>> FileZilla I have no problems with uploading and downloading files with
>> Arabic letters in the filename.
>> But then NiFi (v. 1.12.1) connect with a GetFTP process it list the
>> filenames as  instead of the Arabic letters, and is not able to get the
>> file since that filename does not exist. The GetFTP process is also
>> configured with utf-8 enabled.
>> The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi
>> server to the FTP server I see the same behavior, Arabic letters is showing
>> as ???
>>
>> But since FileZilla can see Arabic letters correctly, I can't see that
>> the issue is at the FTP server.
>> So please help
>>
>> Kind regards
>> Jens M. Kofoed
>>
>>


Issue with GetFTP and arabic letters?

2021-02-22 Thread Jens M. Kofoed
Dear community

I have an issue which I hope some of you might be able to help me with or
point me in the right direction.
We have a FTP server (MS IIS 7) where utf-8 is enabled. If I connect with
FileZilla I have no problems with uploading and downloading files with
Arabic letters in the filename.
But then NiFi (v. 1.12.1) connect with a GetFTP process it list the
filenames as  instead of the Arabic letters, and is not able to get the
file since that filename does not exist. The GetFTP process is also
configured with utf-8 enabled.
The NiFi server is using Ubuntu 18.04 and if I connect from the NiFi server
to the FTP server I see the same behavior, Arabic letters is showing as ???

But since FileZilla can see Arabic letters correctly, I can't see that the
issue is at the FTP server.
So please help

Kind regards
Jens M. Kofoed


Single FlowFile Per Node courses bug in nested version control

2021-01-28 Thread Jens M. Kofoed
Hey

I have an issue with nested PG where the root PG continuously is showing it
has local changes but it has not.
I have now found why, and how you can reproduce the bug
1. Create an empty Process Group (PG-Root)
2. Add version control to the PG-Root
3. Add an empty PG (PG-Sub) in PG-Root
4. Add version control to PG-Sub
5. Commit changes to PG-Root
6. Check that both PG-Root and PG-Sub are up-to-date with the green check
mark
7. Change "Process Group FlowFile Concurrency"  to "Single FlowFile Per
Node" for the PG-Sub
8. Commit changes to PR-Sub
9. Refresh page
10. PG-Sub should have a green check mark, and PG-Root still has local
changes
11. Commit changes to PG-Root
12. PG-Root is still showing it has local changes, but it has not.

Bug created in JIRA

Kind regards


Re: [E] Re: NIFI show version changed *, but version control show no local changes

2021-01-28 Thread Jens M. Kofoed
Hey

I have found the reason to why it is not working.
to reproduce this issue:
1. Create an empty Process Group (PG-Root)
2. Add version control to the PG-Root
3. Add an empty PG (PG-Sub) in PG-Root
4. Add version control to PG-Sub
5. Update PG-Root
6. Check that both PG-Root and PG-Sub are up-to-date with the green check
mark
7. Change "Process Group FlowFile Concurrency"  to "Single FlowFile Per
Node" for the PG-Sub
8. Commit changes to PR-Sub
9. Refresh page
10. PG-Sub should have a green check mark, and PG-Root still has local
changes
11. Commit changes to PG-Root
12. PG-Root is still showing it has local changes, but it has not.

Kind regards
Jens

Den tor. 28. jan. 2021 kl. 07.51 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> I have tried to import all version flows to another NIFI server. It shows
> the same picture.
>
> kind regards
> Jens
>
> Den tor. 28. jan. 2021 kl. 07.40 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> Hi
>>
>> I have tried to compare the 11 flows which is in the same bucket.
>> All 11 flows is using the same parameter context.
>> 9 flows have nested flows which it also under version control.
>> All 5 flows which have issue have nested flows, but there are 4 flows
>> with nested flows which have no issue.
>> 4 of the 5 flows have 2 sub-flows which have a newer version. But in the
>> 5th flow, all sub-flows is up to date.
>> If updating the sub-flows to latest version, they continually show they
>> need update even after updated to latest version.
>>
>> Kind Regards
>> Jens
>>
>>
>>
>>
>>
>> Den tor. 28. jan. 2021 kl. 07.13 skrev Jens M. Kofoed <
>> jmkofoed@gmail.com>:
>>
>>> Hi Bryan
>>>
>>> I just tried to create a new process group importing from registry. As
>>> soon the new process group is created it show the *, and moving the cursor
>>> over the star it says: Tracking to "xxx" Version 7 in "". Local changes
>>> have been made".
>>> Going to show local changes, there are none listed and closing the
>>> window a new windows popup saying: This process Group does not have any
>>> local changes.
>>> See attach screen dumps.
>>>
>>> I all so tried another group, which has neested groups with version
>>> control. Here one of the neested groups has been changed to a newer
>>> version. So the the new root group says there are groups which needs
>>> update. Going to the subgroup which need updates I can see the saved
>>> version is 3 and the newest version is 5. So I update the subgroup to lates
>>> version. Now it still shows it needs update. Going in again for updating
>>> the subgroup it says current version 5, newest version to select is 5.
>>>
>>> If have all so tried to create a new groups for all our version
>>> controlled groups. It is only one bucket which has problems and in this
>>> bucket it is 5 out of 11 flows which has issues. First I thought it might
>>> has something to do with Parameter Context since this bucket is the only
>>> one where Parameter context is used. But some of the flows which has no
>>> issues is all so using parameter context. I'm not able to find the needle,
>>> which is in common for only those 5 flows
>>>
>>> Kind regards
>>> Jens
>>>
>>>
>>> Den ons. 27. jan. 2021 kl. 17.06 skrev Bryan Bende :
>>>
>>>> Jens,
>>>>
>>>> Is there any pattern you can identify for how to reproduce the problem?
>>>>
>>>> If you were to create a brand new empty process group and start
>>>> version control, does the problem happen? is there a specific set of
>>>> steps after that which would put it into this state?
>>>>
>>>> -Bryan
>>>>
>>>> On Wed, Jan 27, 2021 at 10:36 AM Juan Pablo Gardella
>>>>  wrote:
>>>> >
>>>> > Any guide that explains how a process group can be graduated to upper
>>>> environments?
>>>> >
>>>> > On Wed, 27 Jan 2021 at 12:33, Joe Witt  wrote:
>>>> >>
>>>> >> There is no requirement to use the registry.  It simply gives you a
>>>> way to store versioned flows which you can reference/use from zero or more
>>>> nifi clusters/flows to help keep things in line.  Many teams use this to
>>>> ensure as flows are improved over time and worked through
>>>> dev/test/stage/prod environments that they graduate properly.
&

Re: [E] Re: NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Jens M. Kofoed
I have tried to import all version flows to another NIFI server. It shows
the same picture.

kind regards
Jens

Den tor. 28. jan. 2021 kl. 07.40 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Hi
>
> I have tried to compare the 11 flows which is in the same bucket.
> All 11 flows is using the same parameter context.
> 9 flows have nested flows which it also under version control.
> All 5 flows which have issue have nested flows, but there are 4 flows with
> nested flows which have no issue.
> 4 of the 5 flows have 2 sub-flows which have a newer version. But in the
> 5th flow, all sub-flows is up to date.
> If updating the sub-flows to latest version, they continually show they
> need update even after updated to latest version.
>
> Kind Regards
> Jens
>
>
>
>
>
> Den tor. 28. jan. 2021 kl. 07.13 skrev Jens M. Kofoed <
> jmkofoed@gmail.com>:
>
>> Hi Bryan
>>
>> I just tried to create a new process group importing from registry. As
>> soon the new process group is created it show the *, and moving the cursor
>> over the star it says: Tracking to "xxx" Version 7 in "". Local changes
>> have been made".
>> Going to show local changes, there are none listed and closing the window
>> a new windows popup saying: This process Group does not have any local
>> changes.
>> See attach screen dumps.
>>
>> I all so tried another group, which has neested groups with version
>> control. Here one of the neested groups has been changed to a newer
>> version. So the the new root group says there are groups which needs
>> update. Going to the subgroup which need updates I can see the saved
>> version is 3 and the newest version is 5. So I update the subgroup to lates
>> version. Now it still shows it needs update. Going in again for updating
>> the subgroup it says current version 5, newest version to select is 5.
>>
>> If have all so tried to create a new groups for all our version
>> controlled groups. It is only one bucket which has problems and in this
>> bucket it is 5 out of 11 flows which has issues. First I thought it might
>> has something to do with Parameter Context since this bucket is the only
>> one where Parameter context is used. But some of the flows which has no
>> issues is all so using parameter context. I'm not able to find the needle,
>> which is in common for only those 5 flows
>>
>> Kind regards
>> Jens
>>
>>
>> Den ons. 27. jan. 2021 kl. 17.06 skrev Bryan Bende :
>>
>>> Jens,
>>>
>>> Is there any pattern you can identify for how to reproduce the problem?
>>>
>>> If you were to create a brand new empty process group and start
>>> version control, does the problem happen? is there a specific set of
>>> steps after that which would put it into this state?
>>>
>>> -Bryan
>>>
>>> On Wed, Jan 27, 2021 at 10:36 AM Juan Pablo Gardella
>>>  wrote:
>>> >
>>> > Any guide that explains how a process group can be graduated to upper
>>> environments?
>>> >
>>> > On Wed, 27 Jan 2021 at 12:33, Joe Witt  wrote:
>>> >>
>>> >> There is no requirement to use the registry.  It simply gives you a
>>> way to store versioned flows which you can reference/use from zero or more
>>> nifi clusters/flows to help keep things in line.  Many teams use this to
>>> ensure as flows are improved over time and worked through
>>> dev/test/stage/prod environments that they graduate properly.
>>> >>
>>> >> Thanks
>>> >>
>>> >> On Wed, Jan 27, 2021 at 8:31 AM Maksym Skrynnikov <
>>> skrynnikov.mak...@verizonmedia.com> wrote:
>>> >>>
>>> >>> We use NiFi of version 1.12.1 but we do not use NiFi Registry, I
>>> wonder if that's the requirement to use the registry?
>>> >>>
>>> >>> On Wed, Jan 27, 2021 at 2:25 PM Bryan Bende 
>>> wrote:
>>> >>>>
>>> >>>> Please specify the versions of NiFi and NiFi Registry. If it is not
>>> >>>> the latest (1.12.1 and 0.8.0), then it would be good to try with the
>>> >>>> latest since there have been significant improvements around this
>>> area
>>> >>>> in the last few releases.
>>> >>>>
>>> >>>> On Wed, Jan 27, 2021 at 5:45 AM Jens M. Kofoed <
>>> jmkofoed@gmail.com> wrote:
>>> >>>> >
>>&g

Re: [E] Re: NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Jens M. Kofoed
Hi

I have tried to compare the 11 flows which is in the same bucket.
All 11 flows is using the same parameter context.
9 flows have nested flows which it also under version control.
All 5 flows which have issue have nested flows, but there are 4 flows with
nested flows which have no issue.
4 of the 5 flows have 2 sub-flows which have a newer version. But in the
5th flow, all sub-flows is up to date.
If updating the sub-flows to latest version, they continually show they
need update even after updated to latest version.

Kind Regards
Jens





Den tor. 28. jan. 2021 kl. 07.13 skrev Jens M. Kofoed <
jmkofoed@gmail.com>:

> Hi Bryan
>
> I just tried to create a new process group importing from registry. As
> soon the new process group is created it show the *, and moving the cursor
> over the star it says: Tracking to "xxx" Version 7 in "". Local changes
> have been made".
> Going to show local changes, there are none listed and closing the window
> a new windows popup saying: This process Group does not have any local
> changes.
> See attach screen dumps.
>
> I all so tried another group, which has neested groups with version
> control. Here one of the neested groups has been changed to a newer
> version. So the the new root group says there are groups which needs
> update. Going to the subgroup which need updates I can see the saved
> version is 3 and the newest version is 5. So I update the subgroup to lates
> version. Now it still shows it needs update. Going in again for updating
> the subgroup it says current version 5, newest version to select is 5.
>
> If have all so tried to create a new groups for all our version controlled
> groups. It is only one bucket which has problems and in this bucket it is 5
> out of 11 flows which has issues. First I thought it might has something to
> do with Parameter Context since this bucket is the only one where Parameter
> context is used. But some of the flows which has no issues is all so using
> parameter context. I'm not able to find the needle, which is in common for
> only those 5 flows
>
> Kind regards
> Jens
>
>
> Den ons. 27. jan. 2021 kl. 17.06 skrev Bryan Bende :
>
>> Jens,
>>
>> Is there any pattern you can identify for how to reproduce the problem?
>>
>> If you were to create a brand new empty process group and start
>> version control, does the problem happen? is there a specific set of
>> steps after that which would put it into this state?
>>
>> -Bryan
>>
>> On Wed, Jan 27, 2021 at 10:36 AM Juan Pablo Gardella
>>  wrote:
>> >
>> > Any guide that explains how a process group can be graduated to upper
>> environments?
>> >
>> > On Wed, 27 Jan 2021 at 12:33, Joe Witt  wrote:
>> >>
>> >> There is no requirement to use the registry.  It simply gives you a
>> way to store versioned flows which you can reference/use from zero or more
>> nifi clusters/flows to help keep things in line.  Many teams use this to
>> ensure as flows are improved over time and worked through
>> dev/test/stage/prod environments that they graduate properly.
>> >>
>> >> Thanks
>> >>
>> >> On Wed, Jan 27, 2021 at 8:31 AM Maksym Skrynnikov <
>> skrynnikov.mak...@verizonmedia.com> wrote:
>> >>>
>> >>> We use NiFi of version 1.12.1 but we do not use NiFi Registry, I
>> wonder if that's the requirement to use the registry?
>> >>>
>> >>> On Wed, Jan 27, 2021 at 2:25 PM Bryan Bende  wrote:
>> >>>>
>> >>>> Please specify the versions of NiFi and NiFi Registry. If it is not
>> >>>> the latest (1.12.1 and 0.8.0), then it would be good to try with the
>> >>>> latest since there have been significant improvements around this
>> area
>> >>>> in the last few releases.
>> >>>>
>> >>>> On Wed, Jan 27, 2021 at 5:45 AM Jens M. Kofoed <
>> jmkofoed@gmail.com> wrote:
>> >>>> >
>> >>>> > Hi
>> >>>> >
>> >>>> > We have a situation where process groups in NIFI shows they are
>> not up to date in version control. The show a *. But going to version
>> control to see local changes, there are none.
>> >>>> > NIFI reports back, there are no local changes. Submitting a new
>> version, makes no different. A new version is created, but NIFI still shows
>> the * and not the green check mark.
>> >>>> >
>> >>>> > I have tried to restart Registry which doesn't hel

Re: [E] Re: NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Jens M. Kofoed
Hi Bryan

I just tried to create a new process group importing from registry. As soon
the new process group is created it show the *, and moving the cursor over
the star it says: Tracking to "xxx" Version 7 in "". Local changes have
been made".
Going to show local changes, there are none listed and closing the window a
new windows popup saying: This process Group does not have any local
changes.
See attach screen dumps.

I all so tried another group, which has neested groups with version
control. Here one of the neested groups has been changed to a newer
version. So the the new root group says there are groups which needs
update. Going to the subgroup which need updates I can see the saved
version is 3 and the newest version is 5. So I update the subgroup to lates
version. Now it still shows it needs update. Going in again for updating
the subgroup it says current version 5, newest version to select is 5.

If have all so tried to create a new groups for all our version controlled
groups. It is only one bucket which has problems and in this bucket it is 5
out of 11 flows which has issues. First I thought it might has something to
do with Parameter Context since this bucket is the only one where Parameter
context is used. But some of the flows which has no issues is all so using
parameter context. I'm not able to find the needle, which is in common for
only those 5 flows

Kind regards
Jens


Den ons. 27. jan. 2021 kl. 17.06 skrev Bryan Bende :

> Jens,
>
> Is there any pattern you can identify for how to reproduce the problem?
>
> If you were to create a brand new empty process group and start
> version control, does the problem happen? is there a specific set of
> steps after that which would put it into this state?
>
> -Bryan
>
> On Wed, Jan 27, 2021 at 10:36 AM Juan Pablo Gardella
>  wrote:
> >
> > Any guide that explains how a process group can be graduated to upper
> environments?
> >
> > On Wed, 27 Jan 2021 at 12:33, Joe Witt  wrote:
> >>
> >> There is no requirement to use the registry.  It simply gives you a way
> to store versioned flows which you can reference/use from zero or more nifi
> clusters/flows to help keep things in line.  Many teams use this to ensure
> as flows are improved over time and worked through dev/test/stage/prod
> environments that they graduate properly.
> >>
> >> Thanks
> >>
> >> On Wed, Jan 27, 2021 at 8:31 AM Maksym Skrynnikov <
> skrynnikov.mak...@verizonmedia.com> wrote:
> >>>
> >>> We use NiFi of version 1.12.1 but we do not use NiFi Registry, I
> wonder if that's the requirement to use the registry?
> >>>
> >>> On Wed, Jan 27, 2021 at 2:25 PM Bryan Bende  wrote:
> >>>>
> >>>> Please specify the versions of NiFi and NiFi Registry. If it is not
> >>>> the latest (1.12.1 and 0.8.0), then it would be good to try with the
> >>>> latest since there have been significant improvements around this area
> >>>> in the last few releases.
> >>>>
> >>>> On Wed, Jan 27, 2021 at 5:45 AM Jens M. Kofoed <
> jmkofoed@gmail.com> wrote:
> >>>> >
> >>>> > Hi
> >>>> >
> >>>> > We have a situation where process groups in NIFI shows they are not
> up to date in version control. The show a *. But going to version control
> to see local changes, there are none.
> >>>> > NIFI reports back, there are no local changes. Submitting a new
> version, makes no different. A new version is created, but NIFI still shows
> the * and not the green check mark.
> >>>> >
> >>>> > I have tried to restart Registry which doesn't help.
> >>>> >
> >>>> > Restarting NIFI help for a short while. After restaring NIFI the
> process group show the green check mark and another group which is under
> the same version control now shows it needed an update. After updating the
> 2nd process group to the new version this process group now shows the * and
> not the green check mark. Going to version control to see local changes,
> there are none.
> >>>> >
> >>>> > Anybody who have experience with this issue?
> >>>> >
> >>>> > bug repport created:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_NIFIREG-2D437-3F&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=nRtn9-9qg4PKzRb3YqAHXrLTXJYN1G0ZisUsm-XYLkObBvdpApuffGYoI9OPgBKm&m=Z9hTZ0OdCBCst-23EzCV6YNkdOQs--8BkHDlBqQlU2k&s=YVW9lyT5J-D2oUEeIGACI2vGYBHemlqdwupU_Q_5HuU&e=
> >>>> >
> >>>> > kind regards
> >>>> >
> >>>> > Jens M. Kofoed
>


Re: NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Jens M. Kofoed
Sorry for not providing the versions. 
But NiFi is 1.12.1 and registry is 0.8.0
Kind regards 
Jens

> Den 27. jan. 2021 kl. 15.24 skrev Bryan Bende :
> 
> Please specify the versions of NiFi and NiFi Registry. If it is not
> the latest (1.12.1 and 0.8.0), then it would be good to try with the
> latest since there have been significant improvements around this area
> in the last few releases.
> 
>> On Wed, Jan 27, 2021 at 5:45 AM Jens M. Kofoed  
>> wrote:
>> 
>> Hi
>> 
>> We have a situation where process groups in NIFI shows they are not up to 
>> date in version control. The show a *. But going to version control to see 
>> local changes, there are none.
>> NIFI reports back, there are no local changes. Submitting a new version, 
>> makes no different. A new version is created, but NIFI still shows the * and 
>> not the green check mark.
>> 
>> I have tried to restart Registry which doesn't help.
>> 
>> Restarting NIFI help for a short while. After restaring NIFI the process 
>> group show the green check mark and another group which is under the same 
>> version control now shows it needed an update. After updating the 2nd 
>> process group to the new version this process group now shows the * and not 
>> the green check mark. Going to version control to see local changes, there 
>> are none.
>> 
>> Anybody who have experience with this issue?
>> 
>> bug repport created: https://issues.apache.org/jira/browse/NIFIREG-437?
>> 
>> kind regards
>> 
>> Jens M. Kofoed


NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Jens M. Kofoed
Hi

We have a situation where process groups in NIFI shows they are not up to
date in version control. The show a *. But going to version control to see
local changes, there are none.
NIFI reports back, there are no local changes. Submitting a new version,
makes no different. A new version is created, but NIFI still shows the *
and not the green check mark.

I have tried to restart Registry which doesn't help.

Restarting NIFI help for a short while. After restaring NIFI the process
group show the green check mark and another group which is under the same
version control now shows it needed an update. After updating the 2nd
process group to the new version this process group now shows the * and not
the green check mark. Going to version control to see local changes, there
are none.

Anybody who have experience with this issue?

bug repport created: https://issues.apache.org/jira/browse/NIFIREG-437?

kind regards

Jens M. Kofoed


Monitor queue duration?

2021-01-12 Thread Jens M. Kofoed
Hi

I would like to monitor the queue duration for the input to some Merge
Content processors.
I need to check if files are queued up for long time. If so it might be
because a file is missing.
The queue duration information is not part of the Prometheus metriks nor
the site-to-site reports.

Please advise

Kind regards
Jens


1 minutes vs 5 minutes Metrics via Prometheus

2020-12-08 Thread Jens M. Kofoed
We are looking into getting metrics from NIFI into grafana via Prometheus.
It seems the metrics reported are the 5 minutes stats and not the 1 minutes
snap. If we look at the status history of a processor we can see that the
history is shown in a 1 minutes interval. I know the stat on the processor
is showing the last 5 minutes and it seems like it is the same 5
minutes stat that is being reported at the metrics instead of the 1 minutes
snap.

Is it possible to configure the Prometheus reporting task to report the 1
minutes snap instead?

kind regards
Jens M. Kofoed


Unable to revert local changes with registry

2020-10-28 Thread Jens M. Kofoed
Hi

I'm running NIFI 1.12.1 and registry 0.7.0.
Our flows in the registry is from NIFI 1.11.4 and after upgrading NIFI to
1.12.1 I'm no longer able to revert changes.
The NIFI GUI is just showing "Stopping affected Processors" even though all
processors are already stopped.

Is there someone who knows how to fix this?
kind regards
Jens M. Kofoed


Re: Problem with InvokeHTTP not timing out

2020-10-21 Thread Jens M. Kofoed
Thanks for the reply Mark,
I have tried dumping the threads, but I'm not able to find information
about that processor. and unfortunately the system having problems is
running on a sensitive system with no internet access and I'm not allowed
to transfer data from that system to the internet.
Looking in the thread file for InvokeHTTP I can find 3 Timer-Driven Process
Thread-nn with an ID which I can't relate to the process in the UI. 1
thread is RUNNING and 2 are WAITING.
After I stop and terminate the processor and dump to a new file one of the
waiting processes is gone, but the other 2 (1 running and 1 waiting) are
also in the new dump file.
Looking at the dump files and comparing the 2 waiting processors there is
nothing which pops right in my face saying here are some things wrong.

kind regards
Jens M. Kofoed


Den tor. 15. okt. 2020 kl. 16.17 skrev Mark Payne :

> Jens,
>
> If you encounter an issue where a processor seems ’stuck’ the best course
> of action is generally to grab a thread dump (bin/nifi.sh dump
> thread-dump1.txt) and attach that. It will show exactly what the processor
> is doing at that time and help to understand why where it’s getting stuck.
>
> Thanks
> -Mark
>
>
> On Oct 13, 2020, at 6:46 AM, Jens M. Kofoed 
> wrote:
>
> Hi community
> I have some issues with the InvokeHTTP process. Sometimes the process does
> not receive a response from the web server and the process hangs in a
> waiting state without timing out.
> I use nifi version 1.12.1, and the settings for the InvokeHTTP process is
> as follow:
> Penalty duration 30 secs
> Yield duration 1 sec
> Scheduling strategy: Timer driven
> concurrent Taskts: 1
> Run schedule: 1 sec
> Run duration: 0ms
> HTTP metode: GET
> Connection timeout: 5 secs
> Read timeout: 30 secs
> Idle timeout: 2 mins.
> Max Idle connections 1
>
> Looking in the nifi-app.log file for debug messages I can see many
> requests followed by a response from the web server. When the UI no longer
> showed any files going through I tried to stop the process and looked in
> the log file.
> In the logfile I can see the request to the web server, but no response. I
> am expecting a read timeout but nothing happens.
> When trying to stop the process in the UI, the process goes from 1 task to
> 2 tasks and after 10 min. the process has still not stopped. So I have to
> Terminate it.
>
> Can anyone please help?
> I have tried to create a bug report, but it is difficult when there are no
> error messages.
> https://issues.apache.org/jira/browse/NIFI-7899
>
> Kind regards
> Jens M. Kofoed
>
>
>


Problem with InvokeHTTP not timing out

2020-10-13 Thread Jens M. Kofoed
Hi community
I have some issues with the InvokeHTTP process. Sometimes the process does
not receive a response from the web server and the process hangs in a
waiting state without timing out.
I use nifi version 1.12.1, and the settings for the InvokeHTTP process is
as follow:
Penalty duration 30 secs
Yield duration 1 sec
Scheduling strategy: Timer driven
concurrent Taskts: 1
Run schedule: 1 sec
Run duration: 0ms
HTTP metode: GET
Connection timeout: 5 secs
Read timeout: 30 secs
Idle timeout: 2 mins.
Max Idle connections 1

Looking in the nifi-app.log file for debug messages I can see many requests
followed by a response from the web server. When the UI no longer showed
any files going through I tried to stop the process and looked in the log
file.
In the logfile I can see the request to the web server, but no response. I
am expecting a read timeout but nothing happens.
When trying to stop the process in the UI, the process goes from 1 task to
2 tasks and after 10 min. the process has still not stopped. So I have to
Terminate it.

Can anyone please help?
I have tried to create a bug report, but it is difficult when there are no
error messages.
https://issues.apache.org/jira/browse/NIFI-7899

Kind regards
Jens M. Kofoed


Re: How to split json subarrays and keep root

2020-09-29 Thread Jens M. Kofoed
Hi Matt

Thank you very much for pointing me to the ForkRecord. After playing around
with it I got to work.

kind regards
Jens

Den man. 28. sep. 2020 kl. 23.37 skrev Matt Burgess :

> Jens,
>
> Try ForkRecord [1] with "Mode" set to "Extract" and "Include Parent
> Fields" set to "true", I think that does what you're looking to do.
>
> Regards,
> Matt
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.ForkRecord/index.html
>
> On Fri, Sep 25, 2020 at 1:48 AM Jens M. Kofoed 
> wrote:
> >
> > Hi
> >
> > I have a JSON document with an array which I would like to split and
> flatten.
> > In my example below key4 is an array containing 2 documents. I need to
> split the record based on each document in the key4 array, so I end up with
> multiple records. Where each new record has a copy of all keys except key4
> which should be flatten into each own document.
> > {
> > key1:value1,
> > key2:value2,
> > key3: {
> >   key3a:value3a,
> >   key3b:value3b
> > }
> > key4: [
> >{
> >   key4a1:value4a1,
> >   key4a2:value4a2
> >   },
> >   {
> >   key4b1:value4b1,
> >   key4b2:value4b2
> >   }
> > ]
> > }
> >
> > Should be like this:
> > Record 1
> > {
> > key1:value1,
> > key2:value2,
> > key3: {
> >   key3a:value3a,
> >   key3b:value3b
> > }
> > key4:{
> >key4a1:value4a1,
> >key4a2:value4a2
> >   }
> > }
> >
> > Record 2
> > {
> > key1:value1,
> > key2:value2,
> > key3: {
> >   key3a:value3a,
> >   key3b:value3b
> > }
> > key4:{
> >key4b1:value4b1,
> >key4b2:value4b2
> >   }
> > }
> >
> > Kind regards
> > Jens M. Kofoed
>


How to split json subarrays and keep root

2020-09-24 Thread Jens M. Kofoed
Hi

I have a JSON document with an array which I would like to split and
flatten.
In my example below key4 is an array containing 2 documents. I need to
split the record based on each document in the key4 array, so I end up with
multiple records. Where each new record has a copy of all keys except key4
which should be flatten into each own document.
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4: [
   {
  key4a1:value4a1,
  key4a2:value4a2
  },
  {
  key4b1:value4b1,
  key4b2:value4b2
  }
]
}

Should be like this:
Record 1
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4:{
   key4a1:value4a1,
   key4a2:value4a2
  }
}

Record 2
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4:{
   key4b1:value4b1,
   key4b2:value4b2
  }
}

Kind regards
Jens M. Kofoed


How do I split and flatten this JSON document

2020-09-24 Thread Jens M. Kofoed
Hi

I have a JSON document with an array which I would like to split and
flatten.
In my example below key4 is an array containing 2 documents. I need to
split the record based on each document in the key4 array, so I end up with
multiple records. Where each new record has a copy of all keys except key4
which should be flatten into each own document.
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4: [
   {
  key4a1:value4a1,
  key4a2:value4a2
  },
  {
  key4b1:value4b1,
  key4b2:value4b2
  }
]
}

Should be like this:
Record 1
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4:{
   key4a1:value4a1,
   key4a2:value4a2
  }
}

Record 2
{
key1:value1,
key2:value2,
key3: {
  key3a:value3a,
  key3b:value3b
}
key4:{
   key4b1:value4b1,
   key4b2:value4b2
  }
}

Kind regards
Jens M. Kofoed


Re: Detect duplicate records

2020-08-16 Thread Jens M. Kofoed
So Robert too understand it correctly. You have a lot of records in one flow 
file. And if one record has been seen before that record should be removed?
If true: wouldn’t it be a workflow that go through all records, record by 
record and join the final result. So first you would have to split all records, 
check each record and join the rest. No matter if you do it inside or outside 
nifi. Right?
Split records -> hash record -> detect duplicates -> merge records 

Regards Jens. 

> Den 16. aug. 2020 kl. 01.17 skrev Robert R. Bruno :
> 
> Yep we were leaning towards off loading it to an external program and then 
> putting data back to nifi for final delivery.  Looks like that will be best 
> from the sounds of it.  Again thanks all!
> 
>> On Sat, Aug 15, 2020, 16:24 Josh Friberg-Wyckoff  
>> wrote:
>> If that is the case and this is high volume like you say, I would think it 
>> would be more efficient to offload the task to a separate program then 
>> having a processor for NiFi doing it.
>> 
>>> On Sat, Aug 15, 2020, 2:52 PM Otto Fowler  wrote:
>>> I was working on something for this, but in discussion with some of sme’s 
>>> on the project, decided to shelve it.  I don’t think I had gotten to the 
>>> point of a jira.
>>> 
>>> https://apachenifi.slack.com/archives/C0L9S92JY/p1589911056303500 
>>> 
>>>> On August 15, 2020 at 14:12:07, Robert R. Bruno (rbru...@gmail.com) wrote:
>>>> 
>>>> Sorry I should have been more clear.  My need is to detect if each record 
>>>> has been seen in the past.  So I need a solution that would be able to go 
>>>> record by record against something like a redis cache that would tell me 
>>>> either first time the record was seen or not and update the cache 
>>>> accordingly.  Guessing nothing like that for records exists at this point?
>>>> 
>>>> We've used DetectDuplicate to do this for entire flow files, but have the 
>>>> need to do this per record with a preference of not splitting the flow 
>>>> files.
>>>> 
>>>> Thanks all!
>>>> 
>>>>> On Sat, Aug 15, 2020, 13:38 Jens M. Kofoed  wrote:
>>>>> Just some info about DISTINCT. In MySQL a union is much much faster than 
>>>>> a DISTINCT. The DICTINCT create a new temp table with the result of the 
>>>>> query. Sorting it and removing duplicates.
>>>>> If you make a union with a select id=-1, the result is exactly the same. 
>>>>> All duplicates are removed. A DISTINCT which takes 2 min. and 45 sec. 
>>>>> only takes about  15 sec with a union.
>>>>> kind regards.
>>>>> 
>>>>> I don't know which engine is in NIFI.
>>>>> Jens M. Kofoed
>>>>> 
>>>>>> Den lør. 15. aug. 2020 kl. 18.08 skrev Matt Burgess 
>>>>>> :
>>>>>> In addition to the SO answer, if you know all the fields in the
>>>>>> record, you can use QueryRecord with SELECT DISTINCT field1,field2...
>>>>>> FROM FLOWFILE. The SO answer might be more performant but is more
>>>>>> complex, and QueryRecord will do the operations in-memory so it might
>>>>>> not handle very large flowfiles.
>>>>>> 
>>>>>> The current pull request for the Jira has not been active and is not
>>>>>> in mergeable shape, perhaps I'll get some time to pick it up and get
>>>>>> it across the finish line :)
>>>>>> 
>>>>>> Regards,
>>>>>> Matt
>>>>>> 
>>>>>> On Sat, Aug 15, 2020 at 11:47 AM Josh Friberg-Wyckoff
>>>>>>  wrote:
>>>>>> >
>>>>>> > Gosh, I should search the NiFi resources first.  They have current 
>>>>>> > JIRA for what you are wanting.
>>>>>> > https://issues.apache.org/jira/browse/NIFI-6047
>>>>>> >
>>>>>> > On Sat, Aug 15, 2020 at 10:35 AM Josh Friberg-Wyckoff 
>>>>>> >  wrote:
>>>>>> >>
>>>>>> >> This looks interesting as well.
>>>>>> >> https://stackoverflow.com/questions/52674532/remove-duplicates-in-nifi
>>>>>> >>
>>>>>> >> On Sat, Aug 15, 2020 at 10:23 AM Josh Friberg-Wyckoff 
>>>>>> >>  wrote:
>>>>>> >>>
>>>>>> >>> In theory I would think you could use the ExecuteStreamCommand to 
>>>>>> >>> use the builtin Operating System sort commands to grab unique 
>>>>>> >>> records.  The Windows Sort command has an undocumented unique 
>>>>>> >>> option.  The sort command on Linux distros also has a unique option 
>>>>>> >>> as well.
>>>>>> >>>
>>>>>> >>> On Sat, Aug 15, 2020 at 5:53 AM Robert R. Bruno  
>>>>>> >>> wrote:
>>>>>> >>>>
>>>>>> >>>> I wanted to see if anyone knew is there a clever way to detect 
>>>>>> >>>> duplicate records much like you can with entire flow files with 
>>>>>> >>>> DetectDuplicate?  I'd really rather not have to split my records 
>>>>>> >>>> into individual flow files since this flow is such high volume.
>>>>>> >>>>
>>>>>> >>>> Thanks so much in advance.


  1   2   >