Re: PutSFTP TransportException (Timeout expired: 30000 MILLISECONDS) for sending data to Synology NAS - Data Timeout?

2023-04-24 Thread Joe Witt
Hello

If going from 8 nodes with many errors to  1 node with few errors then you
likely hit max connection limits on the sftp server.  You can change that
value on fhe sftp server.  How many concurrent tasks do you allow the
processor.  If Y tasks you will want Y times 8 connections allowed.

Thanks

On Mon, Apr 24, 2023 at 12:23 AM  wrote:

> Hi guys
>
>
>
> We just upgraded our 8 node NiFi cluster to v1.21.0 and we are hitting a
> SSH timeout issue with the *PutSFTP* processor.
>
>
>
> May be someone can help us to find the root cause for our issue or guide
> us into the right direction. The error message below must be related to the
> “*Data Timeout*”, as this is (in our case) the only user configurable
> timeout which is set to 30 seconds.
>
>
>
> *Error Message:*
>
> 2023-04-22 14:41:45,422 ERROR [Timer-Driven Process Thread-11]
> o.a.nifi.processors.standard.PutSFTP
> PutSFTP[id=a73a340e-81f8-1f21-8f04-9bb06b767d7d] Unable to transfer
> StandardFlowFileRecord[uuid=c8be51dc-074c-429d-8b52-077a91b4e339,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1682167305368-1016468,
> container=default, section=660], offset=0,
> length=441044],offset=0,name=aaa-01_detail-20230422-143800_0200.gz,size=441044]
> to remote host nas-lan-01.my.net due to
> org.apache.nifi.processors.standard.socket.ClientConnectException: SSH
> Client connection failed [nas-lan-01.my.net:22]:
> net.schmizz.sshj.transport.TransportException: Timeout expired: 3
> MILLISECONDS; routing to failure
>
> net.schmizz.sshj.transport.TransportException: Timeout expired: 3
> MILLISECONDS
>
>   at
> net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:33)
>
>   at
> net.schmizz.sshj.transport.TransportException$1.chain(TransportException.java:27)
>
>   at net.schmizz.concurrent.Promise.retrieve(Promise.java:139)
>
>   at net.schmizz.concurrent.Event.await(Event.java:105)
>
>   at
> net.schmizz.sshj.transport.KeyExchanger.waitForDone(KeyExchanger.java:148)
>
>   at
> net.schmizz.sshj.transport.KeyExchanger.startKex(KeyExchanger.java:143)
>
>   ...
>
>
>
>
>
> Below a splunk graph which shows the number of error messages per hour.
> The behavior changed when we switched to queue loadbalance strategy “single
> node”. So instead of 8 nodes only 1 node was doing the PutSFTP, the only
> remaining PutSFTP processor processed more data in a shorter timeframe and
> the errors have been gone (the are some errors, but we have a lot of
> PutSFTP processors in our environment and my filter was not that specific).
>
>
>
>
>
>
>
>
>
>
>
> So our question is, how could it be that the session can be initiated but
> no data can be transferred? Any ideas? Is there any mechanism which reuses
> an existing connection? I would assume not? The batch size is set to
> default (500) and one flow file has about 7MB average size. The source data
> is fetched from kafka and comes very regularly… If it would be an issue
> on NAS side, we would assume that it doesn’t matter if one NiFi node does
> the PutSFTP or 8 nodes, but it clearly makes a difference if we change the
> loadbalancing strategy… so we see the culprit clearly on NiFi.
>
>
>
> Cheers Josef
>
>
>
>
>
>
>


Re: How to debug why a node isn't rejoining a cluster

2023-05-02 Thread Joe Witt
Mike

If nothing is in the logs on the impacted node or anything else in the
cluster you will want to grab a thread dump on the affected node and see
what is actually happening.  There has to be logs though  If you have a
large back log then you could be waiting for the initial repository health
check which we really need to log more about...

THanks

On Tue, May 2, 2023 at 7:03 AM Mike Thomsen  wrote:

> I have a node in my dev cluster that is disconnected and won't reconnect.
> I'm tailing the nifi-app.log while trying to reconnect it, but nothing's
> getting logged. The logback configuration is standard for NiFi AFAIK. Are
> there any additional loggers that need to be enabled to figure out why it's
> not reconnecting? It seems like it's just silently failing.
>
> Thanks,
>
> Mike
>


[ANNOUNCE] Apache NiFi 1.21.0 release.

2023-04-07 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.21.0.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute data.  Apache NiFi was made for dataflow.  It
supports highly configurable directed graphs of data routing,
transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352899

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.21.0

Thank you
The Apache NiFi team


[ANNOUNCE] Apache NiFi 1.20.0 release.

2023-02-09 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.20.0.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352581

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.20.0

Thank you
The Apache NiFi team


[ANNOUNCE] Apache NiFi 1.20.0 release.

2023-02-09 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi
1.20.0.

Apache NiFi is an easy to use, powerful, and reliable system to process and
distribute
data.  Apache NiFi was made for dataflow.  It supports highly configurable
directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal ASF
artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12352581

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.20.0

Thank you
The Apache NiFi team


Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Witt
Joe

I dont recall the specific version in which we got it truly sorted but
there was an issue with our default settings for an important content repo
property and how we handled mixture of large/small flowfiles written within
the same underlying slab/claim in the content repository.

Please check what you have for conf/nifi.properties
  nifi.content.claim.max.appendable.size=

What value do you have there?  I recommend reducing it to 50KB and
restarting.

Can you show your full 'nifi.content' section from the nifi.properties?

Thanks

On Wed, Jul 12, 2023 at 7:54 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Raising this thread from the dead...
> Having issues with IO to the flowfile repository.  NiFi will show 500k
> flow files and a size of ~1.7G - but the size on disk on each of the 4
> nodes is massive - over 100G, and disk IO to the flowfile spindle is just
> pegged doing writes.
>
> I do have ExtractText processors that take the flowfile content (.*) and
> put it into an attribute, but the sizes of these is maybe in the 10k at
> most size.  How can I find out what module (there are some 2200) is causing
> the issue?  I think I'm doing something fundamentally wrong with NiFi.  :)
> Perhaps I should change the size of all the queues to something less than
> 10k/1G?
>
> Under cluster/FLOWFILE STORAGE, one of the nodes shows 3.74TBytes of
> usage, but it's actually ~150G on disk.  The other nodes are correct.
>
> Ideas on what to debug?
> Thank you!
>
> -Joe (NiFi 1.18)
> On 3/22/2023 12:49 PM, Mark Payne wrote:
>
> OK. So changing the checkpoint internal to 300 seconds might help reduce
> IO a bit. But it will cause the repo to become much larger, and it will
> take much longer to startup whenever you restart NiFi.
>
> The variance in size between nodes is likely due to how recently it’s
> checkpointed. If it stays large like 31 GB while the other stay small, that
> would be interesting to know.
>
> Thanks
> -Mark
>
>
> On Mar 22, 2023, at 12:45 PM, Joe Obernberger
>   wrote:
>
> Thanks for this Mark.  I'm not seeing any large attributes at the moment
> but will go through this and verify - but I did have one queue that was set
> to 100k instead of 10k.
> I set the nifi.cluster.node.connection.timeout to 30 seconds (up from 5)
> and the nifi.flowfile.repository.checkpoint.interval to 300 seconds (up
> from 20).
>
> While it's running the size of the flowfile repo varies (wildly?) on each
> of the nodes from 1.5G to over 30G.  Disk IO is still very high, but it's
> running now and I can use the UI.  Interestingly at this point the UI shows
> 677k files and 1.5G of flow.  But disk usage on the flowfile repo is 31G,
> 3.7G, and 2.6G on the 3 nodes.  I'd love to throw some SSDs at this
> problem.  I can add more nifi nodes.
>
> -Joe
> On 3/22/2023 11:08 AM, Mark Payne wrote:
>
> Joe,
>
> The errors noted are indicating that NiFi cannot communicate with
> registry. Either the registry is offline, NiFi’s Registry Client is not
> configured properly, there’s a firewall in the way, etc.
>
> A FlowFile repo of 35 GB is rather huge. This would imply one of 3 things:
> - You have a huge number of FlowFiles (doesn’t seem to be the case)
> - FlowFiles have a huge number of attributes
> or
> - FlowFiles have 1 or more huge attribute values.
>
> Typically, FlowFile attribute should be kept minimal and should never
> contain chunks of contents from the FlowFile content. Often when we see
> this type of behavior it’s due to using something like ExtractText or
> EvaluateJsonPath to put large blocks of content into attributes.
>
> And in this case, setting Backpressure Threshold above 10,000 is even more
> concerning, as it means even greater disk I/O.
>
> Thanks
> -Mark
>
>
> On Mar 22, 2023, at 11:01 AM, Joe Obernberger
>   wrote:
>
> Thank you Mark.  These are SATA drives - but there's no way for the
> flowfile repo to be on multiple spindles.  It's not huge - maybe 35G per
> node.
> I do see a lot of messages like this in the log:
>
> 2023-03-22 10:52:13,960 ERROR [Timer-Driven Process Thread-62]
> o.a.nifi.groups.StandardProcessGroup Failed to synchronize
> StandardProcessGroup[identifier=861d3b27-aace-186d-bbb7-870c6fa65243,name=TIKA
> Handle Extract Metadata] with Flow Registry because could not retrieve
> version 1 of flow with identifier d64e72b5-16ea-4a87-af09-72c5bbcd82bf in
> bucket 736a8f4b-19be-4c01-b2c3-901d9538c5ef due to: Connection refused
> (Connection refused)
> 2023-03-22 10:52:13,960 ERROR [Timer-Driven Process Thread-62]
> o.a.nifi.groups.StandardProcessGroup Failed to synchronize
> StandardProcessGroup[identifier=bcc23c03-49ef-1e41-83cb-83f22630466d,name=WriteDB]
> with Flow Registry because could not retrieve version 2 of flow with
> identifier ff197063-af31-45df-9401-e9f8ba2e4b2b in bucket
> 736a8f4b-19be-4c01-b2c3-901d9538c5ef due to: Connection refused (Connection
> refused)
> 2023-03-22 10:52:13,960 ERROR [Timer-Driven Process Thread-62]
> o.a.nifi.groups.StandardProcessGroup 

Re: UI SocketTimeoutException - heavy IO

2023-07-12 Thread Joe Witt
Ah ok.  And 'data/5' is its own partition (same physical disk as data/4?).
And data/5 is where you see those large files?  Can you show what you see
there in terms of files/sizes?

For the checkpoint period the default is 20 seconds.  Am curious to
know what benefit moving to 300 seconds was giving (might be perfectly fine
for some cases - just curious)

Thanks

On Wed, Jul 12, 2023 at 8:18 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thank you Joe -
> The content repo doesn't seem to be the issue - it's the flowfile repo.
> Here is the section from one of the nodes:
>
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
> nifi.content.claim.max.appendable.size=50 KB
> nifi.content.repository.directory.default=/data/4/nifi_content_repository
> nifi.content.repository.archive.max.retention.period=2 days
> nifi.content.repository.archive.max.usage.percentage=50%
> nifi.content.repository.archive.enabled=false
> nifi.content.repository.always.sync=false
> nifi.content.viewer.url=../nifi-content-viewer/
>
>
> nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
>
> nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
> nifi.flowfile.repository.directory=/data/5/nifi_flowfile_repository
> nifi.flowfile.repository.checkpoint.interval=300 secs
> nifi.flowfile.repository.always.sync=false
> nifi.flowfile.repository.retain.orphaned.flowfiles=true
>
> -Joe
> On 7/12/2023 11:07 AM, Joe Witt wrote:
>
> Joe
>
> I dont recall the specific version in which we got it truly sorted but
> there was an issue with our default settings for an important content repo
> property and how we handled mixture of large/small flowfiles written within
> the same underlying slab/claim in the content repository.
>
> Please check what you have for conf/nifi.properties
>   nifi.content.claim.max.appendable.size=
>
> What value do you have there?  I recommend reducing it to 50KB and
> restarting.
>
> Can you show your full 'nifi.content' section from the nifi.properties?
>
> Thanks
>
> On Wed, Jul 12, 2023 at 7:54 AM Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
>> Raising this thread from the dead...
>> Having issues with IO to the flowfile repository.  NiFi will show 500k
>> flow files and a size of ~1.7G - but the size on disk on each of the 4
>> nodes is massive - over 100G, and disk IO to the flowfile spindle is just
>> pegged doing writes.
>>
>> I do have ExtractText processors that take the flowfile content (.*) and
>> put it into an attribute, but the sizes of these is maybe in the 10k at
>> most size.  How can I find out what module (there are some 2200) is causing
>> the issue?  I think I'm doing something fundamentally wrong with NiFi.  :)
>> Perhaps I should change the size of all the queues to something less than
>> 10k/1G?
>>
>> Under cluster/FLOWFILE STORAGE, one of the nodes shows 3.74TBytes of
>> usage, but it's actually ~150G on disk.  The other nodes are correct.
>>
>> Ideas on what to debug?
>> Thank you!
>>
>> -Joe (NiFi 1.18)
>> On 3/22/2023 12:49 PM, Mark Payne wrote:
>>
>> OK. So changing the checkpoint internal to 300 seconds might help reduce
>> IO a bit. But it will cause the repo to become much larger, and it will
>> take much longer to startup whenever you restart NiFi.
>>
>> The variance in size between nodes is likely due to how recently it’s
>> checkpointed. If it stays large like 31 GB while the other stay small, that
>> would be interesting to know.
>>
>> Thanks
>> -Mark
>>
>>
>> On Mar 22, 2023, at 12:45 PM, Joe Obernberger
>>   wrote:
>>
>> Thanks for this Mark.  I'm not seeing any large attributes at the moment
>> but will go through this and verify - but I did have one queue that was set
>> to 100k instead of 10k.
>> I set the nifi.cluster.node.connection.timeout to 30 seconds (up from 5)
>> and the nifi.flowfile.repository.checkpoint.interval to 300 seconds (up
>> from 20).
>>
>> While it's running the size of the flowfile repo varies (wildly?) on each
>> of the nodes from 1.5G to over 30G.  Disk IO is still very high, but it's
>> running now and I can use the UI.  Interestingly at this point the UI shows
>> 677k files and 1.5G of flow.  But disk usage on the flowfile repo is 31G,
>> 3.7G, and 2.6G on the 3 nodes.  I'd love to throw some SSDs at this
>> problem.  I can add more nifi nodes.
>>
>> -Joe
>> On 3/22/2023 11:08 AM, Mark Payne wrote:
>>
>> Joe,
&

Re: Schema Registry

2023-07-19 Thread Joe Witt
And looks like with NiFi 2.0 though we will drop the Hortonworks Schema
Registry as shown here [1].  If maintenance becomes more active for it we
could revisit as needed.

[1] https://issues.apache.org/jira/browse/NIFI-11095



On Wed, Jul 19, 2023 at 12:32 PM Joe Witt  wrote:

> Hello
>
> You will want to check with Cloudera for the status of the Schema Registry
> in question here.  From a NiFi point of view we will continue to integrate
> with schema registries we see in common usage and it should be set up well
> to plug in new ones and phase older ones as we go.  Support for the
> Hwx/Cldr one is there. Support for Confluent is there as is our built-in
> simple one.  There is a new registry supported by Redhat that looks
> interesting, etc..
>
> Thanks
>
> On Wed, Jul 19, 2023 at 12:28 PM Dennis Suhari 
> wrote:
>
>> Hi,
>> as far as I know the Schema registry has developed into
>> https://docs.cloudera.com/runtime/7.2.16/schema-registry-overview/topics/csp-schema_registry_overview.html
>>  with
>> features like authorization and proper get schema (without proprietary
>> envelope) methods.
>>
>> Br,
>>
>> Dennis
>>
>> Von meinem iPhone gesendet
>>
>> Am 19.07.2023 um 02:56 schrieb Jennifer L Tress :
>>
>> 
>> Hello,
>>
>> Is the Hortonworks Schema Registry still actively developed and
>> supported? I would like to use RecordReaders and Writers in my NiFi data
>> flow for transformations between schemas. Since Hortonworks was bought
>> by Cloudera, I wanted to check that record serialization was still a use
>> case supported by NiFi and verify what schema registry to use.
>>
>> Thank you!
>>
>>


Re: Schema Registry

2023-07-19 Thread Joe Witt
Hello

You will want to check with Cloudera for the status of the Schema Registry
in question here.  From a NiFi point of view we will continue to integrate
with schema registries we see in common usage and it should be set up well
to plug in new ones and phase older ones as we go.  Support for the
Hwx/Cldr one is there. Support for Confluent is there as is our built-in
simple one.  There is a new registry supported by Redhat that looks
interesting, etc..

Thanks

On Wed, Jul 19, 2023 at 12:28 PM Dennis Suhari  wrote:

> Hi,
> as far as I know the Schema registry has developed into
> https://docs.cloudera.com/runtime/7.2.16/schema-registry-overview/topics/csp-schema_registry_overview.html
>  with
> features like authorization and proper get schema (without proprietary
> envelope) methods.
>
> Br,
>
> Dennis
>
> Von meinem iPhone gesendet
>
> Am 19.07.2023 um 02:56 schrieb Jennifer L Tress :
>
> 
> Hello,
>
> Is the Hortonworks Schema Registry still actively developed and supported?
> I would like to use RecordReaders and Writers in my NiFi data flow for
> transformations between schemas. Since Hortonworks was bought by
> Cloudera, I wanted to check that record serialization was still a use
> case supported by NiFi and verify what schema registry to use.
>
> Thank you!
>
>


Re: NiFi not rolling logs

2023-07-07 Thread Joe Witt
H.  Interesting.  Can you capture these bits of fun in a jira?

Thanks

On Fri, Jul 7, 2023 at 7:17 PM Mike Thomsen  wrote:

> After doing some research, it appears that  is a wonky
> setting WRT how well it's honored by logback. I let a GenerateFlowFile >
> LogAttribute flow run for a long time, and it just kept filling up. When I
> added  that appeared to force expected behavior on total log
> size. We might want to add the following:
>
> true
> 50GB
>
> On Fri, Jul 7, 2023 at 11:33 AM Michael Moser  wrote:
>
>> Hi Mike,
>>
>> You aren't alone in experiencing this.  I think logback uses a pattern
>> matcher on filename to discover files to delete.  If "something" happens
>> which causes a gap in the date pattern, then the matcher will then fail to
>> pick up and delete files on the other side of that gap.
>>
>> Regards,
>> -- Mike M
>>
>>
>> On Thu, Jul 6, 2023 at 10:28 AM Mike Thomsen 
>> wrote:
>>
>>> We are using the stock configuration, and have noticed that we have a
>>> lot of nifi-app* logs that are well beyond the historic data cap of 30 days
>>> in logback.xml; some of those logs go back to April. We also have a bunch
>>> of 0 byte nifi-user logs and some of the other logs are 0 bytes as well. It
>>> looks like logback is rotating based on time, but isn't cleaning up. Is
>>> this expected behavior or a problem with the configuration?
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>


Re: Processor ID/Processor Name as a default attribute

2023-05-24 Thread Joe Witt
Hello

Can you describe how you would use this information?

 These kinds of details and more are present in provenance data now.

Thanks

On Wed, May 24, 2023 at 7:45 AM Chirthani, Deepak Reddy <
c-deepakreddy.chirth...@charter.com> wrote:

> Is there any chance where Processor_ID or Processor_Name can be added as a
> default attribute for each flowfile so that its value should be the ID/Name
> of the most recent processor it was processed by regardless of the
> relationship it sends to?
>
>
>
> https://issues.apache.org/jira/browse/NIFI-4284
>
>
>
> Thanks
>
>
>
> *[image: image005]*
>
> *Deepak Reddy* | Data Engineer
> ​IT Centers of Excellence
> 13736 Riverport Dr., Maryland Heights, MO 63043
> 
>
>
> The contents of this e-mail message and
> any attachments are intended solely for the
> addressee(s) and may contain confidential
> and/or legally privileged information. If you
> are not the intended recipient of this message
> or if this message has been addressed to you
> in error, please immediately alert the sender
> by reply e-mail and then delete this message
> and any attachments. If you are not the
> intended recipient, you are notified that
> any use, dissemination, distribution, copying,
> or storage of this message or any attachment
> is strictly prohibited.
>


Fwd: [ANNOUNCE] Apache NiFi 1.22.0 release.

2023-06-11 Thread Joe Witt
-- Forwarded message -
From: Joe Witt 
Date: Sun, Jun 11, 2023 at 9:27 PM
Subject: [ANNOUNCE] Apache NiFi 1.22.0 release.
To: 


Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.22.0.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020=12353069

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.22.0

Thank you
The Apache NiFi team


Re: source build issue - Nifi 1.23.0

2023-08-07 Thread Joe Witt
Greg

Yeah seems likely there is a build issue with the RPM profile. I think we
just dont have enough people exercising that thing often enough anymore and
so when it breaks it is silent.  The only thing we meant to have an RPM for
as far as I recall was the nifi assembly itself but perhaps we had an RPM
target in the toolkit.  Probably best to file a JIRA from here with as much
detail as you can including the output of 'mvn -version'

Thanks

On Mon, Aug 7, 2023 at 8:59 AM Gregory M. Foreman <
gfore...@spinnerconsulting.com> wrote:

> Hello:
>
> I am trying to build Nifi 1.23.0 from source using:
>
> JAVA_OPTS="-Xms128m -Xmx4g"
> MAVEN_OPTS="-Dorg.slf4j.simpleLogger.defaultLogLevel=ERROR -Xms1024m
> -Xmx3076m -XX:MaxPermSize=256m"
>
> mvn -T C2.0 --batch-mode -Prpm -DskipTests clean install
>
> but the following error is thrown:
>
> [ERROR] Failed to execute goal
> org.codehaus.mojo:rpm-maven-plugin:2.2.0:attached-rpm (build-bin-rpm) on
> project nifi-toolkit-assembly: Source location
> /root/test/nifi-1.23.0/nifi-toolkit/nifi-toolkit-assembly/target/nifi-toolkit-1.23.0-bin/nifi-toolkit-1.23.0/LICENSE
> does not exist -> [Help 1]
>
> The path above ends at:
>
> /root/test2/nifi/nifi-toolkit/nifi-toolkit-assembly/target/
>
> There is a zip file within it that has not been unzipped:
>
> nifi-toolkit-1.23.0-bin.zip
>
> Is this an issue with the build?  Or am I missing some config setting?
>
>
> Thanks,
> Greg


Re: Need Help in migrating Giant CSV from S3 to SFTP

2023-05-09 Thread Joe Witt
Nilesh,

These processors generally are not memory sensitive as they should
only ever have small amounts in memory at a time so it is likely this
should work well up to 100s of GB objects and so on.  We of course dont
really test that much but it is technically reasonable and designed as
such.  So what would be the bottleneck?  It is exactly what Eric is
flagging.

You will need a large content repository available large enough to hold as
much data in flight as you'll have at any one time.  It looks like you have
single files as large as 400GB with some being 100s or 10s of GB as well
and I'm guessing many can happen at/around one time.  So you'll need a far
larger content repository than you're currently using.  It shows that free
space on any single node is on average 140GB which means you have very
little head room for what you're trying to do.  You should try to have a TB
or more available for this kind of case (per node).

You mention it fails but please provide information showing how/the logs.

Also please do not use load balancing on every connection.  You want to use
that feature selectively/by design choices.  For now - I'd avoid it
entirely or just use it between listing and fetching.  But certainly not
after fetching given how massive the content is that would have to be
shuffled around.

Thanks

On Tue, May 9, 2023 at 9:07 AM Kumar, Nilesh via users <
users@nifi.apache.org> wrote:

> Hi Eric
>
>
>
> I see following for my content Repository. Can you please help me on how
> to tweak it further. I have deployed nifi on K8s with 3 replica pod
> cluster, with no limit of resource. But I guess the pod  cpu/memory will be
> throttled by node capacity itself. I noticed that single I have one single
> file as 400GB all the load goes to any one of the node that picks up the
> transfer. I wanted to know if we can do this any other way of configuring
> the flow. If not please tell me the metrics for nifi to tweak.
>
>
>
> *From:* Eric Secules 
> *Sent:* Tuesday, May 9, 2023 9:26 PM
> *To:* users@nifi.apache.org; Kumar, Nilesh 
> *Subject:* [EXT] Re: Need Help in migrating Giant CSV from S3 to SFTP
>
>
>
> Hi Nilesh,
>
>
>
> Check the size of your content repository. If you want to transfer a 400GB
> file through nifi, your content repository must be greater than 400GB,
> someone else might have a better idea of how much bigger you need. But
> generally it all depends on how many of these big files you want to
> transfer at the same time. You can check the content repository metrics in
> the Node Status from the hamburger menu in the top right corner of the
> canvas.
>
>
>
> -Eric
>
>
>
> On Tue., May 9, 2023, 8:42 a.m. Kumar, Nilesh via users, <
> users@nifi.apache.org> wrote:
>
> Hi Team,
>
> I want to move a very large file like 400GB from S3 to SFTP. I have used
> listS3 -> FetchS3 -> putSFTP. This works for smaller files till 30GB but
> fails for larger(100GB) files. Is there any way to configure this flow so
> that it handles very large single file. If there is any template that
> exists please share.
>
> My configuration are all standard processor configuration.
>
>
>
> Thanks,
>
> Nilesh
>
>
>
>
>
>
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.
>
> Deloitte refers to a Deloitte member firm, one of its related entities, or
> Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a
> separate legal entity and a member of DTTL. DTTL does not provide services
> to clients. Please see www.deloitte.com/about to learn more.
>
> v.E.1
>
>


Re: how SCP full directory to remote location

2023-05-11 Thread Joe Witt
Hello

NiFi doesn't offer an SCP specific processor.  Instead you would use
List/FetchSFTP to pull from the target directory and remote source server
(and recurse to subdirs as desired) and then use PutSFTP to write to the
target directory (including subdirs as desired) of the remote destination
server.

This would not be as efficient as simply using SCP from one host to another
as of course NiFi will be pulling a copy local then pushing.  So if that is
all you need I would just setup a cron to do the scp periodically.  But if
you might want to tap into the data feed to send some to a certain system,
or so you can alter the data in some way/filter things out/etc.. then NiFi
will be a great answer.

Thanks

On Thu, May 11, 2023 at 8:48 AM Eric Secules  wrote:

> Hi Ben,
>
> To clarify, do you want to move files from the same server nifi is running
> on to an sftp server?
>
> -Eric
>
> On Thu., May 11, 2023, 1:58 a.m. Ben .T.George, 
> wrote:
>
>> Hello,
>>
>> How can we scp directory with many sub folders and files to remote sftp
>> server?
>>
>> Regards,
>> Ben
>>
>


Re: Can we access Queued Duration as an attribute?

2024-02-15 Thread Joe Witt
This [1] blog seems amazingly appropriate and wow do we need these/any such
fields we intend to truly honor in a prominent place in the docs.  Super
useful...

[1] https://jameswing.net/nifi/nifi-internal-fields.html

Thanks

On Thu, Feb 15, 2024 at 8:35 AM Mark Payne  wrote:

> Jim,
>
> You can actually reference “lastQueueDate” in Expression Language. It is
> formatted as number of milliseconds since epoch.
>
> So you might have a RouteOnAttribute that has a property named “old” with
> a value of:
> ${lastQueueDate:lt( ${now():minus(1)} )}
>
> So any FlowFile that has been queued for more than 10 seconds would be
> routed to “old”, anything else to “unmatched”
>
> Thanks
> -Mark
>
>
> On Feb 15, 2024, at 10:18 AM, James McMahon  wrote:
>
> That would work - what a good suggestion. I'll do that. I can format the
> resulting number and then RouteOnAttribute by the desired subset of the
> result.
> Something like this to set attribute dt.failure:
>
> ${now():toNumber():toDate("-MM-ddHH:mm:ss"):format("MMddHHmmss","EST")}
> Then I can effectively route the files.
> Thank you Jim S.
>
> On Thu, Feb 15, 2024 at 9:55 AM Jim Steinebrey 
> wrote:
>
>> You could add an UpdateAttribute processor first in the failure path to
>> add a new attribute which contains the time the error occurred by using the
>> ${now()} or ${now():toNumber()} expression language function.
>>
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
>>
>> Then later on in the flow you can compare current time to the saved error
>> time to see how much time has elapsed.
>>
>> — Jim
>>
>>
>> On Feb 15, 2024, at 9:44 AM, James McMahon  wrote:
>>
>> As it turns out lineageStartDate and Queued Duration are very different.
>> Without being able to get at Queued Duration as an attribute, it appears we
>> cannot RouteOnAttribute to filter thousands in a queue by anything like
>> hours they have been in queue.
>> Why would this be helpful? Let us say we have an InvokeHttp processor
>> making calls to a REST endpoint. We leave for a weekend and return to find
>> 5000 files in the Failure queue from this processor. It would be most
>> helpful to identify the start time and end time of these 5000 failures. We
>> can't do that reviewing only the first 100 flowfiles in the queue from the
>> UI.
>> One can make an assumption that all of these 5000 flowfiles that failed
>> InvokeHttp share a similar range of lineageStartDate, but that will not
>> necessarily be true depending on flow complexity.
>>
>> On Wed, Feb 14, 2024 at 9:49 AM James McMahon 
>> wrote:
>>
>>> What a great workaround, thank you once again Mike. I'll put this in and
>>> use it now.
>>> Jim
>>>
>>> On Tue, Feb 13, 2024 at 4:41 PM Michael Moser 
>>> wrote:
>>>
 Hello James,

 I'm not aware of a way to access Queued Duration using expression
 language, but you can access the Lineage Duration information.  The Getting
 Started Guide mentions both entryDate and lineageStartDate as immutable
 attributes on all flowfiles.  These are numbers of milliseconds since
 epoch.  If you need them in a readable format, you can use the format()
 function.

 simple examples:
 ${entryDate} = 1707859943778
 ${lineageStartDate} = 1707859943778
 ${lineageStartDate:format("-MM-dd HH:mm:ss.SSS")} = 2024-02-13
 21:32:23.778

 -- Mike


 On Mon, Feb 12, 2024 at 11:38 AM James McMahon 
 wrote:

> When we examine the contents of a queue through the UI and select a
> flowfile from the resulting list, we see FlowFile Details in the Details
> tab. Are those key/values accessible from nifi expression language? I 
> would
> like to access Queued Duration. I have a queue that holds flowfiles with
> non-successful return codes for calls to REST services, and I want to 
> route
> depending on how long these flowfiles have been sitting in my error queue
> to isolate the window when the REST service was unavailable.
> Thank you for any examples that show how we can access these keys and
> values.
>

>>
>


Re: AuditService NPE After Upgrading to 1.24.0

2023-12-19 Thread Joe Witt
Agreed - thanks for the details.  Can you create a JIRA with it while
you're at it?  Either way it will get attention I'm sure.

Thanks

On Tue, Dec 19, 2023 at 10:18 AM Shawn Weeks 
wrote:

> If I revert to 1.23.2 the issue goes away. Seems like there might be a bug
> in the H2 to Xodus migration
>
> On Dec 19, 2023, at 9:44 AM, Shawn Weeks 
> wrote:
>
> Poked around in Jira and the migration guide and couldn’t find anything
> really mentioning this by I tried to upgraded a 1.21.0 cluster to 1.24.0
> and it won’t start after the upgrade complaining that it can’t instantiate
> the auditService bean. Tracked it down to the EntityStoreAuditService class
> and something about destination name on the connection but I’m not sure
> what connection it’s talking about.
>
> Thanks
> Shawn
>
> Caused by:
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name 'securityFilterChain' defined in
> org.apache.nifi.web.security.configuration.WebSecurityConfiguration:
> Unsatisfied dependency expressed through method 'securityFilterChain'
> parameter 2; nested exception is
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
> Unsatisfied dependency expressed through constructor parameter 2; nested
> exception is org.springframework.beans.factory.BeanCreationException: Error
> creating bean with name 'flowController' defined in class path resource
> [nifi-context.xml]: Cannot resolve reference to bean 'auditService' while
> setting bean property 'auditService'; nested exception is
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name 'auditService' defined in
> org.apache.nifi.web.NiFiWebApiConfiguration: Bean instantiation via factory
> method failed; nested exception is
> org.springframework.beans.BeanInstantiationException: Failed to instantiate
> [org.apache.nifi.admin.service.AuditService]: Factory method 'auditService'
> threw exception; nested exception is java.lang.NullPointerException
> at
> org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:801)
> at
> org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:536)
> at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1352)
> at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1195)
> at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:582)
> at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542)
> at
> org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)
> at
> org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)
> at
> org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)
> at
> org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
> at
> org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276)
> at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.addCandidateEntry(DefaultListableBeanFactory.java:1609)
> at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1573)
> at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveMultipleBeans(DefaultListableBeanFactory.java:1462)
> at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1349)
> at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1311)
> at
> org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.resolveMethodArguments(AutowiredAnnotationBeanPostProcessor.java:816)
> ... 54 common frames omitted
> Caused by:
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
> Unsatisfied dependency expressed through constructor parameter 2; nested
> exception is org.springframework.beans.factory.BeanCreationException: Error
> creating bean with name 'flowController' defined in class path resource

Re: NiFi custom extension and deploy phase on Maven

2023-12-09 Thread Joe Witt
Etienne,

Are you using our nifi poms as your parent pom of your extensions?  I do
wonder if perhaps we should offer one that doesn't tie to all our ASF-isms

Thanks

On Sat, Dec 9, 2023 at 10:54 AM Etienne Jouvin 
wrote:

> Hello all;
>
> Since now, I never push my extensions to any Nexus or Artifactory.
> But now I must do it.
>
> The deploy Maven deploy is in failure and I notice it comes from the
> configuration for the plugin nexus-staging-maven-plugin.
>
> The configuration comes from the artifact nifi, where we can find the
> following configuration.
>
> 
>
> org.sonatype.plugins
>
> nexus-staging-maven-plugin
>
> 1.6.13
>
> true
>
> 
>
> 15
>
> repository.apache.org
>
> https://repository.apache.org/
>
> 
>
> 
>
>
> Two things :
> I did not have any issue when I wanted to push my extension in SNAPSHOT
> version, except I had to configure the enforcer to not fire exception
>
> And to be able to push my extensions in my custom Artifactory, I had to
> disable it by the following configuration, in the build node at my root
> pom.xml file :
>
> 
>
> 
>
> org.sonatype.plugins
>
> nexus-staging-maven-plugin
>
> 
>
> true
>
> 
>
> 
>
>
> What do you think if you put your build configuration inside a profile ?
> Like this, we will not have to add a custom configuration.
>
> Regards
>
> Etienne Jouvin
>
>


Re: New Apache NiFi Website Design Launched

2024-01-08 Thread Joe Witt
Thanks so much to all involved and David for driving this to completion!

It looks great

On Mon, Jan 8, 2024 at 11:17 AM David Handermann <
exceptionfact...@apache.org> wrote:

> Team,
>
> Thanks to a collaborative effort from several designers and
> developers, the Apache NiFi project website has a new look, with more
> prominent links to downloads, documentation, and source code!
>
> https://nifi.apache.org
>
> There is more work to be done in particular areas like generated
> documentation, but as with the project itself, the website is open for
> collaborative input through Jira [1] and GitHub [2].
>
> Regards,
> David Handermann
> Apache NiFi PMC Member
>
> [1] https://issues.apache.org/jira/browse/NIFI
> [2] https://github.com/apache/nifi-site
>


Re: Finding slow down in processing

2024-01-10 Thread Joe Witt
Aaron,

The usual suspects are memory consumption leading to high GC leading to
lower performance over time, or back pressure in the flow, etc.. But your
description does not really fit either exactly.  Does your flow see a mix
of large objects and smaller objects?

Thanks

On Wed, Jan 10, 2024 at 10:07 AM Aaron Rich  wrote:

> Hi all,
>
>
>
> I’m running into an odd issue and hoping someone can point me in the right
> direction.
>
>
>
> I have NiFi 1.19 deployed in a Kube cluster with all the repositories
> volume mounted out. It was processing great with processors like
> UpdateAttribute sending through 15K/5m PutFile sending through 3K/5m.
>
>
>
> With nothing changing in the deployment, the performance has dropped to
> UpdateAttribute doing 350/5m and Putfile to 200/5m.
>
>
>
> I’m trying to determine what resource is suddenly dropping our performance
> like this. I don’t see anything on the Kube monitoring that stands out and
> I have restarted, cleaned repos, changed nodes but nothing is helping.
>
>
>
> I was hoping there is something from the NiFi POV that can help identify
> the limiting resource. I'm not sure if there is additional
> diagnostic/debug/etc information available beyond the node status graphs.
>
>
>
> Any help would be greatly appreciated.
>
>
>
> Thanks.
>
>
>
> -Aaron
>


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread Joe Witt
I went ahead and wrote it up here
https://issues.apache.org/jira/browse/NIFI-12709

Thanks

On Wed, Jan 31, 2024 at 10:30 AM James McMahon  wrote:

> Happy to do that Joe. How do I create and submit a JIRA for consideration?
> I have not done one - at least, not for years.
> If you get me started, I will do a concise and thorough description in the
> ticket.
> Sincerely,
> Jim
>
> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt  wrote:
>
>> James,
>>
>> Makes sense to create a JIRA to improve UnpackContent to extract these
>> attributes in the event of a zip file that happens to present them.  The
>> concept of lastModifiedDate does appear easily accessed if available in the
>> metadata.  Owner/Creator/Creation information looks less standard in the
>> case of a Zip but perhaps still capturable as extra fields.
>>
>> Thanks
>>
>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon 
>> wrote:
>>
>>> I tried to use UnpackContent to extract the files within a zip file
>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>
>>> UnpackContent seemed to work, but it did not preserve file attributes
>>> from the files in the zip. For example, the  lastModifiedTime   is not
>>> available so downstream I am unable to do
>>> this: 
>>> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>>>
>>> I did some digging and found that on the UnpackContent page, it says:
>>> file.lastModifiedTime  "The date and time that the unpacked file was
>>> last modified (*tar only*)."
>>>
>>> I need these file attributes for those files I extract from the zip. So
>>> as an alternative I tried configuring an ExecuteStreamCommand processor
>>> like this:
>>> Command Arguments  -c;"unzip -p -q < -"
>>> Command Path  /bin/bash
>>> Argument Delimiter   ;
>>>
>>> It throws these errors:
>>>
>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>>> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
>>> command /bin/bash ended in an error: /bin/bash: -: No such file or directory
>>>
>>> It does not seem to be applying the unzip to the stdin of the ESC
>>> processor. None of the files in the zip archive are output from ESC.
>>>
>>> What needs to be changed in my ESC configuration?
>>>
>>> Thank you in advance for any help.
>>>
>>>


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread Joe Witt
James,

Makes sense to create a JIRA to improve UnpackContent to extract these
attributes in the event of a zip file that happens to present them.  The
concept of lastModifiedDate does appear easily accessed if available in the
metadata.  Owner/Creator/Creation information looks less standard in the
case of a Zip but perhaps still capturable as extra fields.

Thanks

On Wed, Jan 31, 2024 at 10:01 AM James McMahon  wrote:

> I tried to use UnpackContent to extract the files within a zip file named
> ABC DEF (1).zip. (the filename has spaces in its name).
>
> UnpackContent seemed to work, but it did not preserve file attributes from
> the files in the zip. For example, the  lastModifiedTime   is not available
> so downstream I am unable to do
> this: 
> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>
> I did some digging and found that on the UnpackContent page, it says:
> file.lastModifiedTime  "The date and time that the unpacked file was last
> modified (*tar only*)."
>
> I need these file attributes for those files I extract from the zip. So as
> an alternative I tried configuring an ExecuteStreamCommand processor like
> this:
> Command Arguments  -c;"unzip -p -q < -"
> Command Path  /bin/bash
> Argument Delimiter   ;
>
> It throws these errors:
>
> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
> command /bin/bash ended in an error: /bin/bash: -: No such file or directory
>
> It does not seem to be applying the unzip to the stdin of the ESC
> processor. None of the files in the zip archive are output from ESC.
>
> What needs to be changed in my ESC configuration?
>
> Thank you in advance for any help.
>
>


Re: Restarting Nifi Cluster Systems to add new user

2024-04-30 Thread Joe Witt
Hello

Can you share more about which authentication and authorization provider
you're using with NiFi today?

Also would be good to share which other authentication/authorization
providers your organization could leverage as then it is a question of
which ones are supported out of the box.

Thanks
Joe

On Tue, Apr 30, 2024 at 2:53 PM Shamsudeen Jameer <
shamsudeen.jam...@prth.com> wrote:

> Hello all,I have a nifi cluster system with 3 nodes that require a
> restart of all systems when adding a new user. If any of the nodes are
> online it won’t pick up the new configs.  Basically, nifi will try to keep
> the running config active at all times. Since I have 3 cluster nodes, I
> need to shut down all of them at the same time after adding the new user.
> Is there anything that can be suggested to prevent the shutdown of  all
> systems when adding a new user?
> Thanks!
>
> PROPRIETARY / CONFIDENTIALITY NOTICE
>
> This message (including any attachments) contains information that is
> proprietary and confidential to PRIORITY TECHNOLOGY HOLDINGS, INC. and its
> affiliates and subsidiaries, including the sender hereof, and is for the
> sole use of the intended recipients. If you are not an intended recipient,
> you may not read, print, retain, use, copy, distribute, forward or disclose
> to anyone this message or any information contained in this message
> (including any attachments). If you have received this message in error,
> please advise the sender of this error by reply e-mail, and please destroy
> all copies of this message (including any attachments).
>


Re: Restarting Nifi Cluster Systems to add new user

2024-04-30 Thread Joe Witt
Thanks - take a look at your conf/authorizers file and see which authorizer
is being used.

Based on what you said it sounds
like org.apache.nifi.ldap.tenants.LdapUserGroupProvider

There are properties related to how aggressively it will search for
information from the directory server.  The default appears to be 30
minutes.

Is this potentially the issue?

Thanks

On Tue, Apr 30, 2024 at 3:20 PM Shamsudeen Jameer <
shamsudeen.jam...@prth.com> wrote:

> Hello Joe,
> It's through AD. But it isn't controlled via groups. I ended up having to
> add the user through the UI (which will normally be overwritten).  I'm
> running version 1.23.2.
>
> Regards,
> *Shamsudeen Jameer*Data Operations ManagerPriority
> shamsudeen.jam...@prth.como: (516) 345-5015 M: (917) 374-6599‒‒
>
>
>
> On Tue, Apr 30, 2024 at 6:01 PM Joe Witt  wrote:
>
>> Hello
>>
>> Can you share more about which authentication and authorization provider
>> you're using with NiFi today?
>>
>> Also would be good to share which other authentication/authorization
>> providers your organization could leverage as then it is a question of
>> which ones are supported out of the box.
>>
>> Thanks
>> Joe
>>
>> On Tue, Apr 30, 2024 at 2:53 PM Shamsudeen Jameer <
>> shamsudeen.jam...@prth.com> wrote:
>>
>>> Hello all,I have a nifi cluster system with 3 nodes that require a
>>> restart of all systems when adding a new user. If any of the nodes are
>>> online it won’t pick up the new configs.  Basically, nifi will try to keep
>>> the running config active at all times. Since I have 3 cluster nodes, I
>>> need to shut down all of them at the same time after adding the new user.
>>> Is there anything that can be suggested to prevent the shutdown of  all
>>> systems when adding a new user?
>>> Thanks!
>>>
>>> PROPRIETARY / CONFIDENTIALITY NOTICE
>>>
>>> This message (including any attachments) contains information that is
>>> proprietary and confidential to PRIORITY TECHNOLOGY HOLDINGS, INC. and its
>>> affiliates and subsidiaries, including the sender hereof, and is for the
>>> sole use of the intended recipients. If you are not an intended recipient,
>>> you may not read, print, retain, use, copy, distribute, forward or disclose
>>> to anyone this message or any information contained in this message
>>> (including any attachments). If you have received this message in error,
>>> please advise the sender of this error by reply e-mail, and please destroy
>>> all copies of this message (including any attachments).
>>>
>>
> PROPRIETARY / CONFIDENTIALITY NOTICE
>
> This message (including any attachments) contains information that is
> proprietary and confidential to PRIORITY TECHNOLOGY HOLDINGS, INC. and its
> affiliates and subsidiaries, including the sender hereof, and is for the
> sole use of the intended recipients. If you are not an intended recipient,
> you may not read, print, retain, use, copy, distribute, forward or disclose
> to anyone this message or any information contained in this message
> (including any attachments). If you have received this message in error,
> please advise the sender of this error by reply e-mail, and please destroy
> all copies of this message (including any attachments).
>


Re: java.lang.OutOfMemoryError: Java heap space : NIFI 1.23.2

2024-03-19 Thread Joe Witt
Hello

The key output is

java.lang.OutOfMemoryError: Java heap space

Review batch property options to limit response sizes in the database calls.

Thanks


On Tue, Mar 19, 2024 at 6:15 AM  wrote:

> Hello
>
> I got the executeSQL processor does the sql command "select * from
> public.table1"
> It is a postgresql database.
>
> Here the end of properties of processor.
>
> Max Wait Time 0 seconds
> Normalize Table/Column Names false
> Use Avro Logical Types false
> Compression Format NONE
> Default Decimal Precision 10
> Default Decimal Scale 0
> Max Rows Per Flow File 0
> Output Batch Size 0
> Fetch Size: 20
> Set Auto Commit :false
>
>
> The same sql command works in nifi 1.16.3 with the same configuration
>
> I don't know why it is failed. Thanks
>
> Need you help .. there is a strange error :
>
> 2024-03-19 12:58:37,683 ERROR [Load-Balanced Client Thread-6]
> org.apache.nifi.engine.FlowEngine Uncaught Exception in Runnable task
> java.lang.OutOfMemoryError: Java heap space
> 2024-03-19 12:58:37,684 ERROR [Load-Balanced Client Thread-2]
> org.apache.nifi.engine.FlowEngine Uncaught Exception in Runnable task
> java.lang.OutOfMemoryError: Java heap space
> at
> java.base/java.util.concurrent.CopyOnWriteArrayList.iterator(CopyOnWriteArrayList.java:1024)
> at
> java.base/java.util.concurrent.CopyOnWriteArraySet.iterator(CopyOnWriteArraySet.java:389)
> at
> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:54)
> at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)
> 2024-03-19 12:58:48,940 INFO [main-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
> 2024-03-19 12:58:49,347 ERROR [Timer-Driven Process Thread-9]
> org.apache.nifi.engine.FlowEngine Uncaught Exception in Runnable task
> java.lang.OutOfMemoryError: Java heap space
> 2024-03-19 12:58:49,351 INFO [NiFi Web Server-5835]
> org.apache.nifi.web.server.RequestLog 138.21.169.37 - CN=admin.plants,
> OU=NIFI [19/Mar/2024:12:58:48 +] "GET /nifi-api/flow/cluster/summary
> HTTP/1.1" 200 104 "
> https://nifi-01:9091/nifi/?processGroupId=0f426c92-018e-1000--3fca1e11=;
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
> like Gecko) Chrome/105.0.0.0 Safari/537.36"
> 2024-03-19 12:58:48,559 ERROR [Load-Balanced Client Thread-3]
> org.apache.nifi.engine.FlowEngine Uncaught Exception in Runnable task
> java.lang.OutOfMemoryError: Java heap space
> 2024-03-19 12:58:49,745 INFO [NiFi Web Server-5419]
> org.apache.nifi.web.server.RequestLog 138.21.169.37 - -
> [19/Mar/2024:12:58:49 +] "POST
> /rb_bf28073qyu?type=js3=v_4_srv_8_sn_B1E58A3741D2949DD454A88FF8A4BAF3_perc_10_ol_0_mul_1_app-3A4e195de4d0714591_1_app-3A44074a8878754fd3_1_app-3Ad1603d0792f56d4b_1_app-3A8092cfc902bb1761_1_rcs-3Acss_1=8=post=QKHPLOMPDHAQNPHMHHTUOHHKWJHFRNCG-0=1710851904127=https%3A%2F%2Fnifi-01%3A9091%2Fnifi%2F%3FprocessGroupId%3D0f426c92-018e-1000--3fca1e11%26componentIds%3D=3=d1603d0792f56d4b=1268897524=7xpdnw1j=1
> HTTP/1.1" 200 109 "
> https://nifi-01:9091/nifi/?processGroupId=0f426c92-018e-1000--3fca1e11=;
> "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML,
> like Gecko) Chrome/105.0.0.0 Safari/537.36"
> 2024-03-19 12:58:57,419 ERROR [Timer-Driven Process Thread-5]
> org.apache.nifi.engine.FlowEngine Uncaught Exception in Runnable task
> java.lang.OutOfMemoryError: Java heap space
> 2024-03-19 12:58:50,209 WARN [NiFi Web Server-5689]
> o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine leader for
> role 'Cluster Coordinator'; returning null
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /nifi/clu_quality_2/leaders/Cluster
> Coordinator
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2480)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:243)
> at
> org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:232)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:94)
> at
> 

Re: Nifi on RockyLinux

2024-06-14 Thread Joe Witt
Deepak

In Apache we don't have mechanisms to verify/validate that but indications
are that it should work just fine.

Thanks

On Fri, Jun 14, 2024 at 10:47 AM Chirthani, Deepak Reddy <
c-deepakreddy.chirth...@charter.com> wrote:

> Hi,
>
>
>
> Writing this email to get a confirmation for if Apache Nifi can be
> supported on the Linux Distribution System RockyLinux?
>
>
>
> Thanks
>
> Deepak
> The contents of this e-mail message and any attachments are intended
> solely for the addressee(s) and may contain confidential and/or legally
> privileged information. If you are not the intended recipient of this
> message or if this message has been addressed to you in error, please
> immediately alert the sender by reply e-mail and then delete this message
> and any attachments. If you are not the intended recipient, you are
> notified that any use, dissemination, distribution, copying, or storage of
> this message or any attachment is strictly prohibited.
>


Re: FlattenJSON fails on large json file

2024-06-14 Thread Joe Witt
James

You may be able to use alternative JSON components such as those with
record readers/writes.

You could certainly write a nifi processor in either Java or Python that
would do this and be super efficient.

The processor you've chosen just isn't very flexible in regards to larger
objects and how it uses memory.

Thanks

On Fri, Jun 14, 2024 at 11:13 AM James McMahon  wrote:

> Thanks Eric. So then this in the error message - java.lang.OutOfMemoryError
> - isn't really to be taken at face value. FlattenJson tried to index an
> array that exceeded the maximum value of an integer, and it choked.
>
> An 8 GB file really isn't that large. I'm hoping someone has encountered
> this before and will weigh in with a reply.
>
> On Fri, Jun 14, 2024 at 2:08 PM Eric Secules  wrote:
>
>> Hi James,
>>
>> I don't have a solution for you off the top of my head. But I can tell
>> you the failure is because you've got an array longer than the maximum
>> value of an Int. So, memory is not the limiting factor.
>>
>> -Eric
>>
>> On Fri, Jun 14, 2024, 10:59 AM James McMahon 
>> wrote:
>>
>>> I have a json file, incoming.json. It is 9 GB in size.
>>>
>>> I want to flatten the json so that I can tabulate the number of times
>>> each key appears. Am using a FlattenJson 2.0.0-M2 processor, with
>>> this configuration:
>>>
>>> Separator   .
>>> Flatten Mode  normal
>>> Ignore Reserved Characters  false
>>> Return Typeflatten
>>> Character Set  UTF-8
>>> Pretty Print JSON   true
>>>
>>> This processor has worked so far on json files as large as 2 GB. But
>>> this 9 GB one is causing this issue:
>>>
>>> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: 
>>> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 
>>> 2147483639 + 9 is too large
>>>
>>>
>>> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap 
>>> shows it has 88GB of that dedicated for its use.
>>>
>>>
>>> How can I handle large json files in this processor? It would seem that 
>>> breaking the file up is not an option because it will violate the integrity 
>>> of the json structure most likely.
>>>
>>>
>>> What options do I have?
>>>
>>>


<    5   6   7   8   9   10