Re: ReplaceText - Out of memory - Requested array size exceeds VM limit

2020-08-11 Thread Joe Witt
Asmath

ReplaceText either loads full lines at a time or loads the entire file into
memory.  So keep that in mind.

If you need something that only loads at worst 1-2x the length of the
replacement string you're interested in then I'd recommend just using a
scripted processor that does precisely what you need for now.  You can
stream from the input and stream to the output and result in an extremely
efficient memory usage for arbitrarily large inputs.

Thanks

On Tue, Aug 11, 2020 at 1:01 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi ,
>
> I have a file that is throwing an error when looking for particular string
> and replace it with other string.
>
> Requested array size exceeds VM limit
>
> Any suggestions for this? File is around 800 MB.
>
> Thanks,
> Asmath
>


Re: FetchSFTP: Rename file on move

2020-08-11 Thread Joe Witt
Jairo

You can use a PutSFTP after Fetch to place it where you want.

Thanks

On Tue, Aug 11, 2020 at 3:16 PM Jairo Henao 
wrote:

> Hi community,
>
> Is there a way to rename a file before moving it with FetchSFTP?
>
> After processing a file, I need to move it to a folder and add a timestamp
> suffix to it. The file in the source always has the same name,  but I
> need that when moving it they are not overwritten.
>
> Any ideas or is it necessary to request a modification to the processor?
>
> Thanks
>
> --
> Jairo Henao
> @jairohenaorojas
>
>


Re: FetchSFTP Failed to fetch content for StandardFlowFileRecord

2020-08-12 Thread Joe Witt
These scenarios are ripe for io race conditions between the sending process
(the thing putting files in the directory) and the receiving process
(nifi).   It is vital to ensure the writer either changes the name when
done writing (best case) and the reader only looks at that pattern OR you
ensure the file is of sufficient age before grabbing (usually good but not
foolproof).

thanks

On Wed, Aug 12, 2020 at 7:18 AM Valentina Ivanova 
wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hi Phillip,
>
>
>
>
>
>
>
>
>
>
>
> I am leaning to the solution you are proposing - I can't identify the
> problem.
>
>
>
>
>
>
> I am not using ListSFTP but executing a small script to obtain the list
> with files using ExecuteStreamCommand.
>
>
>
>
>
>
>
>
>
>
>
> Thanks & all the best
>
>
>
>
>
>
>
>
>
>
>
> Valentina
>
>
>
>
>
>
> --
>
>
> *From:* Phillip Grenier 
>
>
> *Sent:* Wednesday, 12 August 2020 16:02
>
>
> *To:* users@nifi.apache.org 
>
>
> *Subject:* Re: FetchSFTP Failed to fetch content for
> StandardFlowFileRecord
>
>
>
>
>
>
>
>
>
>
>
>
> Valentina,
>
>
>
>
>
>
>
> I have seen cases where the server will delete files after being touched
> by the list(S)FTP processor, so if the fetch processor isn't fast enough it
> gets deleted and causes that issue. If the file really does still exist on
> the server then I would try a simple
>
> retry loop and see if a second attempt is successful.
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Aug 12, 2020 at 8:08 AM Valentina Ivanova 
> wrote:
>
>
>
>
>
>
>
>
>
>
> Hi all!
>
>
>
>
>
>
>
>
>
>
>
> I am retrieving files thousands of files with FetchSFTP and from time to
> time (say 1 per 10 min) I receive the following error.
>
>
>
>
>
>
>
>
>
>
>
> FetchSFTP[id=e19a32eb-f9c6-13a0-38b4-45a726ec4273] Failed to fetch content
> for
> StandardFlowFileRecord[uuid=82f316a2-f797-4d32-8aa8-3d1907b54e01,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1597232401901-271,
> container=default, section=271],
>
> offset=515807, length=8690914],offset=8630853,name=1576278001264,size=109]
> from filename [FILENAME] on remote host xxx.xxx.xxx.xxx:xxx due to
> java.io.IOException: Failed to obtain file content for [FILENAME]; routing
> to comms.failure: java.io.IOException:
>
> Failed to obtain file content for [FILENAME]
>
>
>
>
>
>
>
>
>
>
>
> The files are all there and are similar in size and content to the others
> successfully fetched. Any idea what the problem could be?
>
>
>
>
>
>
>
>
>
>
>
> Thanks & best
>
>
>
>
>
>
>
>
>
>
>
> Valentina
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Nifi takes too long to start(~ 30 minutes)

2020-08-13 Thread Joe Witt
Mohit,

You almost certainly want to take that same flow and setup a cluster on a
more recent version of NiFi to compare startup times.  For flows with
thousands of components there are important improvements which have
occurred in the past year and a half.

Startup time, user perceived behavior in the UI on continuous operations,
etc.. have been improved. Further you can now hot load new versions of nars
which should reduce the need to restart.

We also will have 1.12 out hopefully within days so that could be
interesting for you as well.

Thanks

On Thu, Aug 13, 2020 at 7:18 AM Mohit Jain  wrote:

> Hi Team,
>
> I am using a single node NiFi 1.9.0 cluster. It takes more than 30 minutes
> to start each time it is restarted. There are more than 100 flows on the
> NiFi UI with an average of 25 processors per flow. It takes around 25-30
> minutes to reach the cluster election process after which it gets started
> in a minute.
>
> Is this an expected behaviour that startup time is directly proportional
> to the number of processors in the Canvas? Or is there a way to reduce the
> NiFi startup time?
>
> Any leads would be appreciated.
>
> Thanks,
> Mohit
>


Re: Nifi takes too long to start(~ 30 minutes)

2020-08-13 Thread Joe Witt
...true and I believe that has also been improved.  But could be wrong on
that.

On Thu, Aug 13, 2020 at 9:26 AM Brandon DeVries 
wrote:

> Mohit,
>
> How many flowfiles are currently on the instance?  Sometimes a very large
> number of flowfiles can result in slower start times.
>
> Brandon
>
> --
> *From:* Pierre Villard 
> *Sent:* Thursday, August 13, 2020 12:10:51 PM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Nifi takes too long to start(~ 30 minutes)
>
> Hi,
>
> I'm surprised this is something you observe in the election process part.
> I've constantly seen quick startup times even with thousands of components
> in the flow.
> I'd look into the logs and maybe turn on some debug logs to find out
> what's going on.
>
> Pierre
>
> Le jeu. 13 août 2020 à 16:33, Joe Witt  a écrit :
>
> Mohit,
>
> You almost certainly want to take that same flow and setup a cluster on a
> more recent version of NiFi to compare startup times.  For flows with
> thousands of components there are important improvements which have
> occurred in the past year and a half.
>
> Startup time, user perceived behavior in the UI on continuous operations,
> etc.. have been improved. Further you can now hot load new versions of nars
> which should reduce the need to restart.
>
> We also will have 1.12 out hopefully within days so that could be
> interesting for you as well.
>
> Thanks
>
> On Thu, Aug 13, 2020 at 7:18 AM Mohit Jain 
> wrote:
>
> Hi Team,
>
> I am using a single node NiFi 1.9.0 cluster. It takes more than 30 minutes
> to start each time it is restarted. There are more than 100 flows on the
> NiFi UI with an average of 25 processors per flow. It takes around 25-30
> minutes to reach the cluster election process after which it gets started
> in a minute.
>
> Is this an expected behaviour that startup time is directly proportional
> to the number of processors in the Canvas? Or is there a way to reduce the
> NiFi startup time?
>
> Any leads would be appreciated.
>
> Thanks,
> Mohit
>
>


Re: Detect duplicate records

2020-08-16 Thread Joe Witt
I believe Robert's case is that he has records flowing through bundled in
flowfiles containing one or more of them at a time and he'd like to
understand on a per record level (regardless of the flowfile they're
contained in) whether that record has already been seen over some time
interval.

DetectDuplicate wired into an appropriate record processor would be optimal
for this.  A scripted processor could be used now whereas we need to just
add a DetectDuplicateRecord processor or possibly wire this into one of the
existing processors.

Thanks

On Sun, Aug 16, 2020 at 12:52 AM Jens M. Kofoed 
wrote:

> So Robert too understand it correctly. You have a lot of records in one
> flow file. And if one record has been seen before that record should be
> removed?
> If true: wouldn’t it be a workflow that go through all records, record by
> record and join the final result. So first you would have to split all
> records, check each record and join the rest. No matter if you do it inside
> or outside nifi. Right?
> Split records -> hash record -> detect duplicates -> merge records
>
> Regards Jens.
>
> Den 16. aug. 2020 kl. 01.17 skrev Robert R. Bruno :
>
> Yep we were leaning towards off loading it to an external program and then
> putting data back to nifi for final delivery.  Looks like that will be best
> from the sounds of it.  Again thanks all!
>
> On Sat, Aug 15, 2020, 16:24 Josh Friberg-Wyckoff 
> wrote:
>
>> If that is the case and this is high volume like you say, I would think
>> it would be more efficient to offload the task to a separate program then
>> having a processor for NiFi doing it.
>>
>> On Sat, Aug 15, 2020, 2:52 PM Otto Fowler 
>> wrote:
>>
>>> I was working on something for this, but in discussion with some of
>>> sme’s on the project, decided to shelve it.  I don’t think I had gotten to
>>> the point of a jira.
>>>
>>> https://apachenifi.slack.com/archives/C0L9S92JY/p1589911056303500
>>>
>>> On August 15, 2020 at 14:12:07, Robert R. Bruno (rbru...@gmail.com)
>>> wrote:
>>>
>>> Sorry I should have been more clear.  My need is to detect if each
>>> record has been seen in the past.  So I need a solution that would be able
>>> to go record by record against something like a redis cache that would tell
>>> me either first time the record was seen or not and update the cache
>>> accordingly.  Guessing nothing like that for records exists at this point?
>>>
>>> We've used DetectDuplicate to do this for entire flow files, but have
>>> the need to do this per record with a preference of not splitting the flow
>>> files.
>>>
>>> Thanks all!
>>>
>>> On Sat, Aug 15, 2020, 13:38 Jens M. Kofoed 
>>> wrote:
>>>
 Just some info about DISTINCT. In MySQL a union is much much faster
 than a DISTINCT. The DICTINCT create a new temp table with the result of
 the query. Sorting it and removing duplicates.
 If you make a union with a select id=-1, the result is exactly the
 same. All duplicates are removed. A DISTINCT which takes 2 min. and 45 sec.
 only takes about  15 sec with a union.
 kind regards.

 I don't know which engine is in NIFI.
 Jens M. Kofoed

 Den lør. 15. aug. 2020 kl. 18.08 skrev Matt Burgess <
 mattyb...@apache.org>:

> In addition to the SO answer, if you know all the fields in the
> record, you can use QueryRecord with SELECT DISTINCT field1,field2...
> FROM FLOWFILE. The SO answer might be more performant but is more
> complex, and QueryRecord will do the operations in-memory so it might
> not handle very large flowfiles.
>
> The current pull request for the Jira has not been active and is not
> in mergeable shape, perhaps I'll get some time to pick it up and get
> it across the finish line :)
>
> Regards,
> Matt
>
> On Sat, Aug 15, 2020 at 11:47 AM Josh Friberg-Wyckoff
>  wrote:
> >
> > Gosh, I should search the NiFi resources first.  They have current
> JIRA for what you are wanting.
> > https://issues.apache.org/jira/browse/NIFI-6047
> >
> > On Sat, Aug 15, 2020 at 10:35 AM Josh Friberg-Wyckoff <
> j...@thefribergs.com> wrote:
> >>
> >> This looks interesting as well.
> >>
> https://stackoverflow.com/questions/52674532/remove-duplicates-in-nifi
> >>
> >> On Sat, Aug 15, 2020 at 10:23 AM Josh Friberg-Wyckoff <
> j...@thefribergs.com> wrote:
> >>>
> >>> In theory I would think you could use the ExecuteStreamCommand to
> use the builtin Operating System sort commands to grab unique records.  
> The
> Windows Sort command has an undocumented unique option.  The sort command
> on Linux distros also has a unique option as well.
> >>>
> >>> On Sat, Aug 15, 2020 at 5:53 AM Robert R. Bruno 
> wrote:
> 
>  I wanted to see if anyone knew is there a clever way to detect
> duplicate records much like you can with entire flow files with
> DetectDuplicate?  I'd really rather 

[ANNOUNCE] Apache NiFi 1.12.0 release

2020-08-20 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi
1.12.0.

This release includes over 330 bug fixes, improvements and many new
features.

Apache NiFi is an easy to use, powerful, and reliable system to process and
distribute data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs of data routing, transformation, and system
mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal ASF
artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12346778

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.12.0

Thank you
The Apache NiFi team


Re: Data performance with FlowFile Repo's RocksDB

2020-09-10 Thread Joe Witt
Ryan

By far the largest performance relevant activity is flow design itself.  As
a last resort I'd look at repo changes.

Are you using the record processors?  Does your data arrive in batches?

Thanks

On Thu, Sep 10, 2020 at 7:27 AM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> Hi all,
>I've got a NiFi running with a lot of small JSON files and I'm trying
> to squeeze the most performance out of it.
>
>I recently saw the new RocksDB FlowFile Repo (
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository)
> and was wondering what kind, if any, performance gains we could expect out
> of it.
>
> Thanks,
> Ryan
>


Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Joe Witt
Ryan

What version are you using? I do think we had an issue that kept items
around longer than intended that has been addressed.

Thanks

On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> Hello,
> I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of
> data on my canvas.
>
> However, the content repository (on it's own partition) is completely full
> with 350GB of data.  I'm pretty certain the way Content Claims store the
> data is responsible for this.  In previous experience, we've had files that
> are larger, and haven't seen this as much.
>
> My guess is that as data was streaming through and being added to a claim,
> it isn't always released as the small files leaves the canvas.
>
> We've run into this issue enough times that I figure there's probably a
> "best practice for small files" for the content claims settings.
>
> These are our current settings:
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
> nifi.content.claim.max.appendable.size=1 MB
> nifi.content.claim.max.flow.files=100
> nifi.content.repository.directory.default=/var/nifi/repositories/content
> nifi.content.repository.archive.max.retention.period=12 hours
> nifi.content.repository.archive.max.usage.percentage=50%
> nifi.content.repository.archive.enabled=true
> nifi.content.repository.always.sync=false
>
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository
>
>
> There's 1024 folders on the disk (0-1023) for the Content Claims.
> Each file inside the folders are roughly  2MB to 8 MB (Which is odd
> because I thought the max appendable size would make this no larger than
> 1MB.)
>
> Is there a way to expand the number of folders and/or reduce the amount of
> individual FlowFiles that are stored in the claims?
>
> I'm hoping there might be a best practice out there though.
>
> Thanks,
> Ryan
>
>


Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Joe Witt
can you share your flow.xml.gz?

On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> 1.12.0
>
> Thanks,
> Ryan
>
> On Thu, Sep 17, 2020 at 11:04 AM Joe Witt  wrote:
>
>> Ryan
>>
>> What version are you using? I do think we had an issue that kept items
>> around longer than intended that has been addressed.
>>
>> Thanks
>>
>> On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson <
>> ryan.andrew.hendrick...@gmail.com> wrote:
>>
>>> Hello,
>>> I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB
>>> of data on my canvas.
>>>
>>> However, the content repository (on it's own partition) is
>>> completely full with 350GB of data.  I'm pretty certain the way Content
>>> Claims store the data is responsible for this.  In previous experience,
>>> we've had files that are larger, and haven't seen this as much.
>>>
>>> My guess is that as data was streaming through and being added to a
>>> claim, it isn't always released as the small files leaves the canvas.
>>>
>>> We've run into this issue enough times that I figure there's probably a
>>> "best practice for small files" for the content claims settings.
>>>
>>> These are our current settings:
>>>
>>> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
>>> nifi.content.claim.max.appendable.size=1 MB
>>> nifi.content.claim.max.flow.files=100
>>> nifi.content.repository.directory.default=/var/nifi/repositories/content
>>> nifi.content.repository.archive.max.retention.period=12 hours
>>> nifi.content.repository.archive.max.usage.percentage=50%
>>> nifi.content.repository.archive.enabled=true
>>> nifi.content.repository.always.sync=false
>>>
>>>
>>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository
>>>
>>>
>>> There's 1024 folders on the disk (0-1023) for the Content Claims.
>>> Each file inside the folders are roughly  2MB to 8 MB (Which is odd
>>> because I thought the max appendable size would make this no larger than
>>> 1MB.)
>>>
>>> Is there a way to expand the number of folders and/or reduce the amount
>>> of individual FlowFiles that are stored in the claims?
>>>
>>> I'm hoping there might be a best practice out there though.
>>>
>>> Thanks,
>>> Ryan
>>>
>>>
>>


Re: NiFi V1.9.2 Performance

2020-09-23 Thread Joe Witt
Nathan

You have plenty powerful machines to hit super high speeds but what I
cannot tell is how the disks are setup/capability and layout wise and
relative to our three repos of importance.  You'll need to share those
details.

That said, the design of the flow matters.  The Kafka processors that
aren't record oriented will perform poorly unless they're acquiring data in
their natural batches as they arrive from kafka.  In short, use the record
oriented processors from Kafka.  In it you can even deal with the fact you
want to go from AVRO to Json and so on.  These processors have a tougher
learning curve but they perform extremely well and we have powerful
processors to go along with them for common patterns.

You absolutely should be able to get to the big numbers you have seen.  It
requires great flow design (powerful machines are secondary).

Thanks

On Wed, Sep 23, 2020 at 9:26 AM  wrote:

> Hi All,
>
>
>
> We’ve got a NiFi 3 Node Cluster running on 3 x 40 CPU, 256GB RAM (32G Java
> Heap) servers. However, we have only been able to achieve a consumption of
> ~9.48GB Consumption Compressed (38.53GB Uncompressed) over 5 minutes, with
> a production rate of ~16.84GB out of the cluster over  5 mins. This is much
> lower than we were expecting based on what we have read. With this
> throughput we see a CPU load ~32 on all nodes, so we know there isn’t much
> else we can get out of the CPU).
>
>
>
> We have also tried SSDs, Raided and Unraided HDDs for the content repo
> storage, but they haven’t made a difference to the amount we can process.
>
>
>
> The process is as follows:
>
> 1.   Our flow reads from Kafka Compressed (Maximum of 2000 records
> per file). It then converts them from Avro to JSON. (ConsumeKafka_0_10 à
> UpdateAttribute à ConvertRecord)
>
> 2.   Depending on which topic the flow file is consumed from, we then
> send the message to one of 10 potential process groups, each containing
> between 3 and 5 processors within the process groups. (RouteOnAttribute à
> Relevant Processing Group containing JoltTransformJSON and several custom
> processors we have made).
>
> 3.   Finally, we produce the flow file content back to one of several
> Kafka topics, based on the input topic name in Avro format with Snappy
> compression on the Kafka topic.
>
>
>
> Inspecting the queued message counts, it indicates that the Jolt
> Transforms are taking the time to process (Large queues before JOLT
> processors, small or no queues afterwards). But I’m not sure why this is
> any worse than the rest of the processors as the event duration is less
> than a second when inspecting in provenance? We have tuned the number of
> concurrent tasks, duration and schedules to get the performance we have so
> far.
>
>
>
> I’m not sure if there is anything anyone could recommend or suggest to try
> and make improvements? We need to achieve a rate around 5x of what it’s
> currently processing with the same number of nodes. We are running out of
> ideas on how to accomplish this and may have to consider alternatives.
>
>
>
> Kind Regards,
>
>
>
> Nathan
>


Re: NiFi V1.9.2 Performance

2020-09-23 Thread Joe Witt
Nathan

Not sure what read/write rates you'll get in these RAID-10 configs but
generally this seems like it should be fine (100s of MB/sec per node range
at least).  Whereas now you're seeing about 20MB/sec/node.  This is
definitely very low.

If you review
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-6-nar/1.12.0/org.apache.nifi.processors.kafka.pubsub.ConsumeKafkaRecord_2_6/index.html
then you'll see that we do actually capture attributes such as kafka.topic
and so on.  Flowfiles would also be properly grouped by that.  What I'm not
positive of is it could handle reading from multiple topics at the same
time while also honoring and determining each of their distinct schemas.
Would need to test/verify that scenario to be sure.  If you do have a bunch
of topics and they could grow/change then keeping this single processor
approach makes sense.  If you can go the route of one ConsumeKafkaRecord
per topic then obviously that would work well.

Not seeing your flow though I cannot be certain where the bottleneck(s)
exist and provide guidance.  But this is without a doubt a vital skill to
achieving maximum performance.

You'd have to show/share a ton more details for folks here to be helpful in
walking through the full design.  Or explain the end to end flow.

As an additional food for thought if the flows are indeed 'from kafka -> do
stuff -> back to kafka' this is likely a great use case for stateless-nifi.

Thanks

On Wed, Sep 23, 2020 at 10:43 AM  wrote:

> Hi Joe,
>
>
>
> Thanks for getting back to me so quickly.
>
>
>
> Our disk setup is as follows:
>
> Path
>
> Storage Type
>
> Format
>
> Capacity
>
> Content
>
> /
>
> 100GB OS SSD
>
> ext4
>
> 89.9GB
>
> OS, NiFi install, Logs
>
> /data/1/
>
> 2 x 4TB SAS Hard Drives in RAID 1
>
> ext4
>
> 3.7TB
>
> Database and Flowfile Repos
>
> /data/2/
>
> 8 x 4TB SAS Hard Drives in RAID 10
>
> ext4
>
> 14.6TB
>
> Content Repo
>
> /data/3/
>
> 2 x 4TB SAS Hard Drives in RAID 1
>
> ext4
>
> 3.7TB
>
> Provence Repo
>
> /ssd
>
> 1 x 4TB PCIe NVMe SSD
>
> ext4
>
> 3.7TB
>
> Content Repo (Used instead of /data/2/ as a test), to see if CPU was
> bottlenecked by Disk operations.
>
>
>
> I will certainly take a look at those. One question with the consume
> record processor is how I would consume from multiple topics and ensure the
> correct Avro schema is used to deserialise the message? We have 1:1 mapping
> of schemas to topics. At the moment the ConsumeKafka processor is reading
> from all topics in one consumer. I’m assuming the attribute kafka.topic
> attribute doesn’t exist at this stage? We use the Avro Schema Registry
> Controller as we don’t have a schema registry in place yet.
>
>
>
> Kind Regards,
>
>
>
> Nathan
>
>
>
> *From:* Joe Witt [mailto:joe.w...@gmail.com]
> *Sent:* 23 September 2020 17:33
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi V1.9.2 Performance
>
>
>
> Nathan
>
>
>
> You have plenty powerful machines to hit super high speeds but what I
> cannot tell is how the disks are setup/capability and layout wise and
> relative to our three repos of importance.  You'll need to share those
> details.
>
>
>
> That said, the design of the flow matters.  The Kafka processors that
> aren't record oriented will perform poorly unless they're acquiring data in
> their natural batches as they arrive from kafka.  In short, use the record
> oriented processors from Kafka.  In it you can even deal with the fact you
> want to go from AVRO to Json and so on.  These processors have a tougher
> learning curve but they perform extremely well and we have powerful
> processors to go along with them for common patterns.
>
>
>
> You absolutely should be able to get to the big numbers you have seen.  It
> requires great flow design (powerful machines are secondary).
>
>
>
> Thanks
>
>
>
> On Wed, Sep 23, 2020 at 9:26 AM  wrote:
>
> Hi All,
>
>
>
> We’ve got a NiFi 3 Node Cluster running on 3 x 40 CPU, 256GB RAM (32G Java
> Heap) servers. However, we have only been able to achieve a consumption of
> ~9.48GB Consumption Compressed (38.53GB Uncompressed) over 5 minutes, with
> a production rate of ~16.84GB out of the cluster over  5 mins. This is much
> lower than we were expecting based on what we have read. With this
> throughput we see a CPU load ~32 on all nodes, so we know there isn’t much
> else we can get out of the CPU).
>
>
>
> We have also tried SSDs, Raided and Unraided HDDs for the content repo
> storage, but they haven’t made a difference to th

Re: Unable to rollback changes with Nifi Registry

2020-10-05 Thread Joe Witt
Juan

Probably best to create a jira with all the details you provided here so
someone can look into it when available

Thanks

On Mon, Oct 5, 2020 at 12:00 PM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> Hi all,
>
> Does anyone know about this issue? What does it mean?
>
> Thanks,
> Juan
>
>
> On Sun, 4 Oct 2020 at 11:25, Juan Pablo Gardella <
> gardellajuanpa...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am starting to play with Nifi Registry. I am using Nifi 1.12.1 and Nifi
>> Registry 0.7.0 almost with all default configurations. No security on both.
>> My flow has some custom processors and services which were generated by
>> nifi-nar-maven-plugin/1.3.2.
>>
>> I was able to create a bucket in Nifi Registry and started the version
>> control and committed some changes.  The problem appeared when I tested
>> roll back local changes and following error appears in the UI
>>
>> Found bundle com.foo:myprocessr-nar:0.1.0-SNAPSHOT but does not support
>> com.foo.MyService
>>
>> There are no errors in Nifi Registry and Nifi logs. A 409 conflict issue
>> appears in the browser developer console: POST
>> http://localhost:8000/nifi-api/versions/revert-requests/process-groups/e8ff381c-0174-1000-b8ce-873a9ad9bd47
>> operation.
>>
>> I uploaded two NARs files to Nifi Registry using upload bundle
>> 
>> to test if that was the problem but it didn't work also.
>>
>> I cannot find too much in google, anyone has an idea what does this error
>> mean? and also, how I can fix it? Notice the processors and services are
>> working without issues on Nifi. Both NAR files were packed using
>>
>> I tried change to true  "allowBundleRedeploy":   "allowPublicRead":  but
>> still failed (it is created with false).
>>
>> Thanks
>>
>>


Re: sslcontext certs

2020-10-14 Thread Joe Witt
Michael,

There is not any specific way supported or intended to combine the context
used by NiFi's own HTTP server with those that would be used by processors
within the flow.

However, using parameter contexts here is a great way to ensure you have
only a single place to update for flow internals.  If those values are
parameterized it should work out nicely.

Thanks

On Wed, Oct 14, 2020 at 11:34 AM Michael Di Domenico 
wrote:

> i have a nifi server with several listenhttp modules on different
> ports.  each one has an sslcontext within it that uses the same certs
> as the main 443 instance.
>
> sadly i changed the cert when expired on the 443 port, but failed to
> change the sslcontext on the ports.  is there a way to tell the
> sslcontext on the other ports to just use the same cert that's on the
> 443 port?
>
> what i'm trying to avoid having to do is change the filename in all
> the contexts to point to the new cert, i'd rather change it in one
> place and have everything else pick it up
>
> using a symlink on the filesystem seemed like one way, but i thought
> there might be a way to do it in nifi
>


Re: Long Polling Client

2020-10-14 Thread Joe Witt
Clay

Have you evaluated whether InvokeHTTP will give you the desired behavior
for your case - in particular with a long timeout perhaps?  If you have and
it doesn't do the job do you mean something which initiates a request to an
HTTP server then assumes the response will remain open and it should take
portions of the response and treat each as its own flowfile/object to pass
along?

Thanks

On Tue, Oct 13, 2020 at 6:31 PM Clay Teahouse 
wrote:

> Does NiFi have a processor that can act as a client for a long polling
> server, for example an SSE server?
> More specifically, I want a client that can issue a HTTP GET request to a
> long polling server and accept stream of messages from the server (on the
> same connection).
> If there isn't one, which processor is best to extend to achieve this goal?
>
> thanks.
>


Re: Long Polling Client

2020-10-14 Thread Joe Witt
Right makes sense.  They're waiting for the completion of the response as
the payload to pass on.  If we need incremental handling of the response
body then we need to factor that into a given processor.  InvokeHttp or
something like it is a good candidate.  Almost a sort of 'InvokeHttpRecord'
kind of processor which uses the format/schema awareness of records to
frame the response objects and then based on some configurable value passes
those records on as a flow file.

Thanks

On Wed, Oct 14, 2020 at 1:25 PM Clay Teahouse 
wrote:

> Hello,
> I tried both getHTTP and invokeHTTP (but didn't try all options). What I
> need is to deal with the cases, such as SSE (Server Sent Events) which
> works with long polling. Meaning the client initiates a connection (via a
> HTTP request) to the SSE server, the server keeps the connection open (for
> as long as possible) and streams data back to the client on the same
> connection. I am able to have the data streamed to the client if I set up a
> stand alone HTTP client and issue HTTP requests but I am not able to get
> getHTTP or invokeHTTP work (I don't get any messages back).
>
> thanks.
>
> On Wed, Oct 14, 2020 at 2:50 PM Joe Witt  wrote:
>
>> Clay
>>
>> Have you evaluated whether InvokeHTTP will give you the desired behavior
>> for your case - in particular with a long timeout perhaps?  If you have and
>> it doesn't do the job do you mean something which initiates a request to an
>> HTTP server then assumes the response will remain open and it should take
>> portions of the response and treat each as its own flowfile/object to pass
>> along?
>>
>> Thanks
>>
>> On Tue, Oct 13, 2020 at 6:31 PM Clay Teahouse 
>> wrote:
>>
>>> Does NiFi have a processor that can act as a client for a long polling
>>> server, for example an SSE server?
>>> More specifically, I want a client that can issue a HTTP GET request to
>>> a long polling server and accept stream of messages from the server (on the
>>> same connection).
>>> If there isn't one, which processor is best to extend to achieve this
>>> goal?
>>>
>>> thanks.
>>>
>>


Re: Build Problem 1.11.4 on MacOS

2020-10-20 Thread Joe Witt
Darren,

I believe there were gremlins in that JDK release.. Can you please try
something like 265?

On Tue, Oct 20, 2020 at 8:52 AM Darren Govoni  wrote:

> Hi,
>   Seem to have this recurring problem trying to build on MacOS with
> nifi-utils. Anyone have a workaround or fix for this?
>
> Thanks in advance!
>
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.1:testCompile 
> (groovy-tests) on project nifi-utils: Compilation failure
> [ERROR] Failure executing groovy-eclipse compiler:
> [ERROR] Annotation processing got disabled, since it requires a 1.6 compliant 
> JVM
> [ERROR] Exception in thread "main" java.lang.NoClassDefFoundError: Could not 
> initialize class org.codehaus.groovy.vmplugin.v7.Java7
> [ERROR]   at 
> org.codehaus.groovy.vmplugin.VMPluginFactory.(VMPluginFactory.java:43)
>
> AFAIK my jvm is compliant
>
> dgovoni@C02RN8AHG8WP nifi % java -version
> openjdk version "1.8.0_262"
> OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_262-b10)
> OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.262-b10, mixed mode)
> dgovoni@C02RN8AHG8WP nifi %
>
>
>


Re: NiFi 1.11.4 HDFS/HBASE Processor Errors After Kerberos Ticket Expires

2020-10-21 Thread Joe Witt
If there is nothing in the logs but they stop working I suspect the issue
is related the default prompt for name.  Update settings in bootstrap is
most likely needed.

Thanks

On Wed, Oct 21, 2020 at 3:36 PM Peter Turcsanyi 
wrote:

> Are there any exception stack traces in the log when the processors fail /
> before that?
>
> On Thu, Oct 22, 2020 at 12:28 AM  wrote:
>
>> Hello!
>>
>>
>>
>> We’re running into a problem with NiFi 1.11.4.
>>
>>
>>
>> Our HBASE/HDFS/Parquet processors are all configured with a master
>> KeytabCredentialsService that is pointing to a Kerberos principal and
>> keytab file.
>>
>>
>>
>> The environment’s /etc/krb5.conf file has the line renew_lifetime = 7d
>> commented out due to an issue with Java-OpenJDK (that is apparently fixed
>> but still shows up) causing “MESSAGE STREAM MODIFIED (41)” errors to appear
>> whenever we have it uncommented.
>>
>>
>>
>> When NiFi starts, it is able to kinit with the Kerberos KDC and is issued
>> a 24 hour ticket. Everything works fine right up until that ticket expires.
>> Once the ticket expires, all of our HDFS/HBASE/Parquet processors start
>> failing.
>>
>>
>>
>> I haven’t been able to find anything in our logs around the timeframe,
>> but I can’t turn on debug logging for this because the logs are
>> tremendously large when we do that (approximately 100-200 MB per minute and
>> the problem only occurs at the 24 hour mark).
>>
>>
>>
>> How would we go about troubleshooting this issue?
>>
>>
>>
>> Environment:
>>
>> Red Hat Enterprise Linux 7.9
>>
>> Apache NiFi 1.11.4
>>
>> java-11-openjdk 11.0.8.10-1.el7
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>


Re: PutAzureBlobStorage OutOfMemoryError

2020-10-28 Thread Joe Witt
Eric

Your assumption is definitely what it should be.  This will need someone to
investigate.  Please file a jira with as much detail as you can.

Thanks

On Wed, Oct 28, 2020 at 12:40 PM Eric Secules  wrote:

> Hello everyone,
>
> I am trying to upload a 300 MB file to azure blob storage using
> PutAzureBlobStorage and the processor is failing due to an
> OutOfMemoryError. My JVM heap size is set to 512 MB, but I wouldn't expect
> this to be an issue because the PutAzureBlobStorage processor should be
> using streaming to send the file to azure in chunks rather than reading it
> into memory in its entirety and then sending it out.
>
> I am using NiFi version 1.12.1
>
> Here's the error from the NiFi logs:
>
>> 2020-10-28 19:34:10,717 ERROR [Timer-Driven Process Thread-6]
>> o.a.n.p.a.storage.PutAzureBlobStorage
>> PutAzureBlobStorage[id=74b80a47-016d-3430-fd74-ece7653158d5]
>> PutAzureBlobStorage[id=74b80a47-016d-3430-fd74-ece7653158d5] failed to
>> process session due to java.lang.OutOfMemoryError: Java heap space;
>> Processor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
>> Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>> 2020-10-28 19:34:10,717 WARN [Timer-Driven Process Thread-6]
>> o.a.n.controller.tasks.ConnectableTask Administratively Yielding
>> PutAzureBlobStorage[id=74b80a47-016d-3430-fd74-ece7653158d5] due to
>> uncaught Exception: java.lang.OutOfMemoryError: Java heap space
>> java.lang.OutOfMemoryError: Java heap space
>>
>
> What's the recommendation for using NiFi to upload files of this size to
> blob storage?
>
> Thanks,
> Eric
>


Re: There isnot show any items on Provenance GUI

2020-10-29 Thread Joe Witt
Thanks for following up with the info on how you solved it.  Glad it is
working now.

Thanks

On Thu, Oct 29, 2020 at 7:13 PM 杨量(上海-技术部-开发)  wrote:

> Hi,
>
> The problem I'm having is the permission configuration.
>
> There should grant permission to node proxy and user to view provenance.
>
> Now the problem is solved
>
> Thanks
>
> Paul
>
>
>
> [mailto:liang.y...@feiniu.com]
>
> *:* 2020年10月29日 9:21*:* users@nifi.apache.org
>
> *:* There isnot show any items on Provenance GUI
>
>
>
> Hello,
>
> I’m working on two NIFI nodes cluster with NIFI 1.11.4 version. But the
> Provenance GUI cannot show any items.
>
> There is my configuration about the Provenance:
>
>
>
> $ grep provenance /opt/nifi-1.11.4/conf/nifi.properties
>
>
> nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
>
> nifi.provenance.repository.debug.frequency=1_000_000
>
> nifi.provenance.repository.encryption.key.provider.implementation=
>
> nifi.provenance.repository.encryption.key.provider.location=
>
> nifi.provenance.repository.encryption.key.id=
>
> nifi.provenance.repository.encryption.key=
>
> nifi.provenance.repository.directory.default=/data2/provenance_repository1
>
>
> nifi.provenance.repository.directory.provenance2=/data2/provenance_repository2
>
> nifi.provenance.repository.max.storage.time=4 hours
>
> nifi.provenance.repository.max.storage.size=80 GB
>
> nifi.provenance.repository.rollover.time=30 secs
>
> nifi.provenance.repository.rollover.size=100 MB
>
> nifi.provenance.repository.query.threads=2
>
> nifi.provenance.repository.index.threads=2
>
> nifi.provenance.repository.compress.on.rollover=true
>
> nifi.provenance.repository.always.sync=false
>
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,
> Filename, ProcessorID, Relationship
>
> nifi.provenance.repository.indexed.attributes=
>
> nifi.provenance.repository.index.shard.size=2 GB
>
> nifi.provenance.repository.max.attribute.length=65536
>
> nifi.provenance.repository.concurrent.merge.threads=2
>
> nifi.provenance.repository.buffer.size=500
>
>
>
> This nifi cluster has been running for 2 days. The bellows are the log
> when I query the SEND event, it should be list some records on Provenance
> GUI: what is the root cause ‘retrieved 0 events’ and how can I resolve the
> issue?
>
>
>
> 2020-10-29 08:56:08,515 INFO
> org.apache.nifi.provenance.index.lucene.QueryTask: Successfully queried
> index /data2/provenance_repository2/lucene-8-index-1603930630342 for query
> +eventType:send +time:[160390080 TO 1603987199000]; retrieved 0 events
> with a total of 5 hits in 141 millis
>
> 2020-10-29 08:56:08,517 INFO
> org.apache.nifi.provenance.index.lucene.QueryTask: Successfully queried
> index /data2/provenance_repository1/lucene-8-index-1603929964785 for query
> +eventType:send +time:[160390080 TO 1603987199000]; retrieved 0 events
> with a total of 20787 hits in 125 millis
>
> 2020-10-29 08:56:08,655 INFO
> org.apache.nifi.provenance.index.lucene.QueryTask: Successfully queried
> index /data2/provenance_repository2/lucene-8-index-1603889124980 for query
> +eventType:send +time:[160390080 TO 1603987199000]; retrieved 0 events
> with a total of 22256 hits in 119 millis
>
> 2020-10-29 08:56:08,745 INFO
> org.apache.nifi.provenance.StandardQueryResult: Completed Query[ [SEND] ]
> comprised of 4 steps in 411 millis
>
> 2020-10-29 08:56:08,745 INFO
> org.apache.nifi.provenance.index.lucene.QueryTask: Successfully queried
> index /data2/provenance_repository1/lucene-8-index-1603888576272 for query
> +eventType:send +time:[160390080 TO 1603987199000]; retrieved 0 events
> with a total of 13792 hits in 202 millis
>
>
>
> Thanks
>
> Paul
>


Re: Provenance queries effect on processing

2020-11-03 Thread Joe Witt
Eric,

short version: Provenance queries can absolutely take away CPU time from
the flow.

longer version:
They get the same priority as any other thread in nifi.  Once prov queries
are being executed they use CPU.  I would strongly advise against any
blending of the flow execution with provenance queries. What you're trying
to do though is a great idea and aligns to what Mark Payne has talked about
previous as a job management mechanism.  This can/should be done without
provenance itself.


Thanks



On Tue, Nov 3, 2020 at 3:31 PM Eric Secules  wrote:

> Hello everyone,
>
> I was wondering if it were possible for excessive use of the provenance
> api would cause flowfile processing to slow down and even come to a halt?
> My test setup queries the provenance API to see if all flowfiles that
> descended from a given input file have completed processing. This leads to
> a lot of requests to the provenance API and I fear it may be too much for
> the test VM to handle (4 vCPU and 16GB RAM). Do provenance query threads
> take precedence over scheduled task threads?
>
> Thanks,
> Eric
>


Re: Authorization Framework

2020-11-04 Thread Joe Witt
Darren

You will want this thread on dev list to get traction.

Also please clarify if you mean authorization or whether you  mean
authentication.   I read all usages as meaning to discuss authentication.

thanks

On Wed, Nov 4, 2020 at 9:53 AM Darren Govoni  wrote:

> Greetings!
>
> We have an internal need to move to a specific PK based authorization for
> all our nifi processors. Currently, authorizations such as basic auth and
> kerberos seem to be wired directly inside the processors. My design
> approach to addressing our need also seeks to factor authorization out of
> processors where specific authorization handlers can be composed and
> config/run time and lighten the responsibilities inside processor classes.
>
> Towards this end, my initial design goals for this framework are thus:
>
> 1) Allow various kinds of authorization handlers to be written and added
> to processors without necessarily recoding the processor.
> 2) Allow for a pipeline effect where one or more authorizers might need to
> operate at the same time.
> 3) Do not disrupt existing processors that rely on their internal coding
> for authorization
> 4) Use appropriate design patterns to allow for flexible implementations
> of principals, credentials and other authorization assets.
> 5) Secure any clear text assets (usernames and passwords) in existing
> authorizations when moving them inside the framework.
>
> How does the community conduct initial design reviews of such changes? We
> would be quite a ways from contributing anything back but want to keep in
> sync with community practices and expectations to make such an offering
> immediately useful.
>
> Regards,
> Darren
>
>


Re: Provenance queries effect on processing

2020-11-04 Thread Joe Witt
Eric

Nope Im not aware of anything specific.

Thanks
Joe

On Wed, Nov 4, 2020 at 9:54 AM Eric Secules  wrote:

> Hello,
>
> I agree it's not the best idea to use the provenance data constantly to
> check when a test file is finally done being processed. Do you know if
> anything came out in 1.12.0 or 1.12.1 which would cause provenance queries
> to have a greater impact? We recently upgraded nifi among other changes and
> I'm trying to figure out what's the cause for many of our tests suddenly
> timing out.
>
> Thanks,
> Eric
>
> On Tue., Nov. 3, 2020, 6:18 p.m. Eric Secules,  wrote:
>
>> Hi Joe,
>>
>> Thanks for the explanation, is there a Jira ticket for for a job
>> management mechanism? Is this a priority for a coming release?
>>
>> Is there a lag between events occurring and them becoming searchable, if
>> so what settings help control this lag?
>>
>> Thanks,
>> Eric
>>
>> On Tue, Nov 3, 2020 at 2:35 PM Joe Witt  wrote:
>>
>>> Eric,
>>>
>>> short version: Provenance queries can absolutely take away CPU time from
>>> the flow.
>>>
>>> longer version:
>>> They get the same priority as any other thread in nifi.  Once prov
>>> queries are being executed they use CPU.  I would strongly advise against
>>> any blending of the flow execution with provenance queries. What you're
>>> trying to do though is a great idea and aligns to what Mark Payne has
>>> talked about previous as a job management mechanism.  This can/should be
>>> done without provenance itself.
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tue, Nov 3, 2020 at 3:31 PM Eric Secules  wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I was wondering if it were possible for excessive use of the provenance
>>>> api would cause flowfile processing to slow down and even come to a halt?
>>>> My test setup queries the provenance API to see if all flowfiles that
>>>> descended from a given input file have completed processing. This leads to
>>>> a lot of requests to the provenance API and I fear it may be too much for
>>>> the test VM to handle (4 vCPU and 16GB RAM). Do provenance query threads
>>>> take precedence over scheduled task threads?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>


Re: Authorization Framework

2020-11-04 Thread Joe Witt
Darren

Its difficult to get to what you have in mind as you keep saying
authorization but then giving examples of authentication protocols
(kerberos/keytabs, basic auth).

Lets focus though on your later comment about hdfs processors.  Take for
example put hdfs...it connects to and hdfs cluster to put data.  In terms
of the actual dataflow we get to authenticate/convey our identity to hdfs
and where we want to write data.  Hdfs then gets to accept or reiect that.
 *That* is authorization.  Now then, speaking in terms of flow
administration in nifi we do have authorization scenarios.  Like who can
view the processor, start it, stop it, and so on.   This kind of
authorization in nifi IS something that can be extended/altered to meet
some awesome and complex needs.

Lets keep circling closer to your intent here

thanks

On Wed, Nov 4, 2020 at 1:38 PM Darren Govoni  wrote:

> Hi Bryan,
>Thanks for the input. Right now, I'm really exploring how better to
> accommodate migrating from the use of keytabs to our corporate mandate for
> pkinit support. Observing that the current authorizations in processors
> (basic auth, kerberos etc) are tightly wired, it suggested to me an
> opportunity to move security more into an "aspect" of processors rather
> than woven into the processor specific code. Of course, there are
> interactions that take place throughout the behavior of the processor.
>
> This is a fairly common approach to security, noting that for any given
> behavior, it can be done all the same securely or without security. So I
> would think it should be possible to abstract the authorizations as needed,
> and there are a variety of patterns to pull those details out of the
> components needing them. 🙂
>
> I suppose the difference is more subtle these days, but in my mind
> Authentication does one thing. Decide if principal (e.g.username) is who
> they say they are, using the provided credentials (e.g. password). Once
> this is established the authentication service will return an identifying
> token, cert etc. That's it. Now, some services will inline this activity
> along with authorization.
>
> For us, this is already handled and the identity is established as a
> digitally signed X.509 certificate. Thus, our primary need is to consult
> our security services which will decide if that principal is allowed to do
> something - such as use HDFSProcessor, Query Solr, etc.
>
> In looking at the current code in the processors (and I haven't studied
> them all but will look more closely at HDFS), it didn't seem like a good
> approach to layer another authorization (PKInit) into that existing code
> and it will certainly get crowded in processors doing that, which should
> focus on processing. Just my opinions so far! Subject to change.
>
> Darren
>
> --
> *From:* Bryan Bende 
> *Sent:* Wednesday, November 4, 2020 3:22 PM
>
> *To:* users@nifi.apache.org 
> *Subject:* Re: Authorization Framework
>
> Darren,
>
> I also thought you were talking about authentication. Processors don’t
> really perform authorization, they provide credentials to some system which
> is authentication, the system then decides if they authenticated
> successfully, and then some systems may also perform authorization to
> determine if the authenticated identity is allowed to perform the action.
> The examples you gave of basic auth and kerberos are both authentication
> mechanisms.
>
> I think it will be very hard to not have this logic embedded in processors
> since many times it is specific to the client library being used. For
> example, HDFS processors use the UserGroupInformation class from
> hadoop-common for kerberos authentication where as Kafka processors use the
> Kafka client which takes a JAAS config string.
>
> The parts that can be factored out are usually common things like
> credential holders, such as SSLContextService or KeytabCredentialService,
> both of which don’t really do anything besides hold values that are then
> used in different ways by various processors.
>
> If we are missing what you are talking about, let us know.
>
> Thanks,
>
> Bryan
>
> On Nov 4, 2020, at 2:45 PM, Darren Govoni  wrote:
>
> Thanks Joe.
>
> Just looking to see where community might be going down the road with
> respect to processor security so we can keep our efforts aligned.
>
> In regards to your question I primarily mean authorization. Our company
> already has a SSO that establishes identity credentials so these are then
> used to authorize specific functions and access to certain infrastructure
> systems when constructing flows.
>
> Darren
>
> Sent from my Verizon, Samsung Galaxy smartphone
> Get Outlook for Android &

Re: [EXTERNAL] Re: horizontal merge

2020-11-18 Thread Joe Witt
Geoffrey

The process session is requiring you to account for all flow files you've
either created or pulled from the queue.

You have logic which pulls up to 2 things.  It could pull one in which case
you are returning.  You would get the above error from that in those cases.

In the case you get two items you read from 1 and 2 and create a 3rd.  You
now appear to remove 1 and 2 then transfer the 3rd.  That should be fine.

Thanks

On Wed, Nov 18, 2020 at 1:50 PM Greene (US), Geoffrey N <
geoffrey.n.gre...@boeing.com> wrote:

> Session.remove()!  That’s very helpful, and it makes my numbers come out
> correctly.  I’m Still getting “transfer relationship not specified”, though.
>
> Here’s where I’m at now:
>
>
>
> session.read(flowFile1, {inputStream ->
>
> *def* slurper1 = *new* groovy.json.JsonSlurper()
>
> *def* json1 = slurper1.parse(inputStream)
>
> }  *as* InputStreamCallback)
>
>
>
> session.read(flowFile2, {inputStream ->
>
> d*ef* slurper2 = *new* groovy.json.JsonSlurper()
>
> *def* json2 = slurper2.parse(inputStream)
>
> }  *as* InputStreamCallback)
>
>
>
> *def*  mergedFile = session.create()
>
> mergedFile = session.write(mergedFile, {outputStream ->
>
> outputStream.write(“new information".bytes)
>
> } *as* OutputStreamCallback)
>
> session.remove (flowFile1)
>
> session.remove(flowFile2)
>
> session.transfer(mergedFile, REL_SUCCESS)
>
>
>
>
>
>
>
> Geoffrey Greene
>
> Senior Software Ninjaneer
>
> (703) 414 2421
>
> The Boeing Company
>
>
>
> *From:* Chris Sampson [mailto:chris.samp...@naimuri.com]
> *Sent:* Wednesday, November 18, 2020 3:33 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: [EXTERNAL] Re: horizontal merge
>
>
>
> This message was sent from outside of Boeing. Please do not click links or
> open attachments unless you recognize the sender and know that the content
> is safe.
>
>
>
>
> You may want a call to `session.remove(flowFile1)` instead of transferring
> it.
>
>
>
> Cheers,
>
> Chris Sampson
>
>
>
> On Wed, 18 Nov 2020, 20:03 Greene (US), Geoffrey N, <
> geoffrey.n.gre...@boeing.com> wrote:
>
> I've gotten closer with grabbing two files and processing them.  I still
> have something wrong in the paradigm though.  Here's what I've got it
> narrowed down to (This is in an ExecuteGroovyScript, BTW.  I hope to
> translate it to an InvokeScriptedProcessor later on, so I can define the
> transfer end points)
>
> // get two files, always two.  Read file1, and write certain fields to
> file2
> flowFileList = session.get(2)
> if (flowFileList.size() != 2)  return
>
> flowFile1 = flowFileList.get(0)
> flowFile2 = flowFileList.get(1)
>
> if(!flowFile1) return
> if(!flowFile2) return
>
> flowFile1 = session.read(flowFile1, {inputStream ->
>def slurper = new groovy.json.JsonSlurper()
> json1 = slurper.parse(inputStream1)
> }  as OutputStreamCallback)
>
> flowFile2 = session.write(flowFile2, {outputStream ->
> outputStream.write("foo plus some data from json1".bytes)
> }  as OutputStreamCallback)
>
> // I really don't want to TRANSFER flow file 1, I want it to go AWAY, ,but
> // I have to do this
> session.transfer(flowFile1, REL_SUCCESS) // << isn't needed
> session.transfer(flowFile2, REL_SUCCESS)
>
>
> Both files  do get read correctly, and output to success, but Nifi always
> throws the error that "transfer relationship not specified", which I
> gather, means that the call to transfer failed because one file (probably
> flowFile1) is not up to date
>
> Any thoughts?  How do you grab two files at once and then transfer them?
>
> I really only want to transfer just the ONE out, since the data was merged
> in, but I can manage with two files if I have to make # inputs = # outputs
>
> Thanks
>
>
> -Original Message-
> From: Greene (US), Geoffrey N
> Sent: Tuesday, November 17, 2020 8:30 PM
> To: users@nifi.apache.org
> Subject: RE: [EXTERNAL] Re: horizontal merge
>
> The data is actually coming from a rest call,  which provides json.  That
> is pretty smooth at this point.
>
> The challenge seems to be one of associating two different numbers
> records, and combing them back into one single file (like how do I know
> when I've processed all the records in BOTH files to give one single output
> file.  I like your suggestion of rolling back the sesion and returning,
> though, I will look into that, (though it might mean the file has to be
> processed as one single file, rather than handling them as splits/merges
>
> I've also been playing with MergeRecords and MergeContent too, and I might
> be making some progress.  My struggle now is trying to figure out when I
> know all records are processed, since I don't have a constant number of
> results to watch for. I may end up writing a file appender, and just
> appending "as I go", so I don't have to do a count.
>
> -Original Message-
> From: Matt Burgess [mailto:mattyb...@apache.org]
> Sent: Tuesday, November 17, 2020 4:22 PM
> To: users@nifi.apache.org
> Subject: [EXTE

Re: NIFI and Out of Memory Error

2020-12-03 Thread Joe Witt
John,

First, as a general rule it is usually very doable to build flows which are
very stream oriented rather than entire file oriented.  That processor by
its nature isn't friendly in this way if configured to work with large
memory chunks.  Alternatives often exist.

Second, I do think it is wise to restart the JVM in the event of an OOME.
There are ways to configure your JVM to do this automatically.  Googling
'restart JVM on oome' for instance could be helpful there.

Thanks

On Thu, Dec 3, 2020 at 11:04 AM jgunvaldson  wrote:

> Just looking for an opinion
>
> Knowing (for one example) that ReplaceText Processor can be very memory
> intensive with large files - we are finding it more and more common to wake
> up to an Out of Memory error like the following
>
> 2020-12-03 15:07:21,748ZUTC ERROR [Timer-Driven Process Thread-31]
> o.a.nifi.processors.standard.ReplaceText
> ReplaceText[id=352afe80-4195-3f56-8798-aaf8be160581]
> ReplaceText[id=352afe80-4195-3f56-8798-aaf8be160581] failed to process
> session due to java.lang.OutOfMemoryError: Java heap space; Processor
> Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Java heap
> space
> java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.nifi.processors.standard.ReplaceText.onTrigger(ReplaceText.java:255)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
>
>
> My question is this. Knowing that "When an OOME occurs in a JVM this can
> cause the JVM to skip instructions. Skipping instructions can compromise
> the integrity of the JVM memory without displaying errors. You can't always
> tell from the outside if a JVM has compromised memory, the only safe thing
> to do is restart the JVM.”
>
> And in this case “Restart NIFI”
>
> Is that “our collective” understanding also, that a Restart of NIFI is
> mandatory - or optional?
>
> Thanks
>
> John
>
>


Re: NIFI and Out of Memory Error

2020-12-03 Thread Joe Witt
I am honestly not sure if it is required - but it is probably a good idea.
Please let us know what you find using those.

Also, definitely we should change that flow to avoid large memory
consumption.  Want to share more details on the input data and config that
results in this?

On Thu, Dec 3, 2020 at 11:21 AM jgunvaldson  wrote:

> Thanks Joe,
>
> I am getting the general opinion that on OOM restart is not optional, must
> be done. In that case I am going to also look at some of the following
>
> -XX:+ExitOnOutOfMemoryError
> -XX:+CrashOnOutOfMemoryError
>
> *ExitOnOutOfMemoryError*
> When you enable this option, the JVM exits on the first occurrence of an
> out-of-memory error. It can be used if you prefer restarting an instance of
> the JVM rather than handling out of memory errors.
>
> *CrashOnOutOfMemoryError*
> If this option is enabled, when an out-of-memory error occurs, the JVM
> crashes and produces text and binary crash files.
>
> Best Regards
> John
>
>
>
> On Dec 3, 2020, at 10:09 AM, Joe Witt  wrote:
>
> John,
>
> First, as a general rule it is usually very doable to build flows which
> are very stream oriented rather than entire file oriented.  That processor
> by its nature isn't friendly in this way if configured to work with large
> memory chunks.  Alternatives often exist.
>
> Second, I do think it is wise to restart the JVM in the event of an OOME.
> There are ways to configure your JVM to do this automatically.  Googling
> 'restart JVM on oome' for instance could be helpful there.
>
> Thanks
>
>>
>>
>


Re: [Bug] Duplicate Flow Import From Registry

2020-12-10 Thread Joe Witt
Eric,

This is simply NiFi on your laptop (single node?) talking to a Registry
running on some IaaS infrastructure?  There is no load balancer/proxy/etc..
in between ?

Thanks

On Thu, Dec 10, 2020 at 4:17 PM Eric Secules  wrote:

> Hello everyone,
>
> My team is encountering a bug where we import a flow from our registry
> residing in the cloud to our laptops. The import takes a long time and we
> end up with multiple copies (about 6) of the same flow one on top of each
> other on the canvas. The canvas becomes unresponsive and we are unable to
> move boxes around.
>
> I am also experiencing slowness when showing local changes, reverting
> local changes and changing the local version.
>
> Thanks,
> Eric
>


Re: Nifi 1.1.14 - too many open files after a while

2020-12-15 Thread Joe Witt
No known leak.

Which JVM are you using?

On Tue, Dec 15, 2020 at 6:18 AM Marel, J. van der (Jasper) <
jasper.van.der.ma...@ing.com> wrote:

> Hi,
>
>
>
> I have installed nifi 1.1.14 on a CentOS7 machine. Ulimits are set to
> 5 open files for the user.
>
> However after a while a lot of open files are reported, mostly related to
> an increasing number of sockets. The Apache nifi canvas is almost blank.
>
> Is this a know issue ?
>
>
>
> Version : nifi_3_5_1_1_3-1.11.4.3.5.1.1-3.x86_64 (HortonWorks branded)
>
> OS : Redhat 7.9
>
>
>
> [/proc/27306/fd] ls -la | wc -l | 3742 open files
>
> [/proc/27306/fd] socket | wc -l | 835 open sockets ( and increasing over
> time)
>
>
>
> Met vriendelijke groet / Kind regards,
>
>
> *Jasper van der Marel*
>
>
>
> -
> ATTENTION:
> The information in this e-mail is confidential and only meant for the 
> intended recipient. If you are not the intended recipient, don't use or 
> disclose it in any way. Please let the sender know and delete the message 
> immediately.
> -
>
>


Re: Nifi 1.1.14 - too many open files after a while

2020-12-15 Thread Joe Witt
Hello

I recommend downgrading the jvm to 2.6.1 or upgrading to after 2.7.1.

It is a known issue with that jvm.

thanks

On Tue, Dec 15, 2020 at 6:35 AM Marel, J. van der (Jasper) <
jasper.van.der.ma...@ing.com> wrote:

> Hi,
>
>
>
> I am using this java version :
>
>
>
> java -version
>
> java version "1.8.0_271"
>
> Java(TM) SE Runtime Environment (build 1.8.0_271-b25)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.271-b25, mixed mode)
>
>
>
> *From:* Joe Witt 
> *Sent:* dinsdag 15 december 2020 14:34
> *To:* users@nifi.apache.org
> *Subject:* Re: Nifi 1.1.14 - too many open files after a while
>
>
>
> No known leak.
>
>
>
> Which JVM are you using?
>
>
>
> On Tue, Dec 15, 2020 at 6:18 AM Marel, J. van der (Jasper) <
> jasper.van.der.ma...@ing.com> wrote:
>
> Hi,
>
>
>
> I have installed nifi 1.1.14 on a CentOS7 machine. Ulimits are set to
> 5 open files for the user.
>
> However after a while a lot of open files are reported, mostly related to
> an increasing number of sockets. The Apache nifi canvas is almost blank.
>
> Is this a know issue ?
>
>
>
> Version : nifi_3_5_1_1_3-1.11.4.3.5.1.1-3.x86_64 (HortonWorks branded)
>
> OS : Redhat 7.9
>
>
>
> [/proc/27306/fd] ls -la | wc -l | 3742 open files
>
> [/proc/27306/fd] socket | wc -l | 835 open sockets ( and increasing over
> time)
>
>
>
> Met vriendelijke groet / Kind regards,
>
>
> *Jasper van der Marel*
>
>
>
> -
>
> ATTENTION:
>
> The information in this e-mail is confidential and only meant for the 
> intended recipient. If you are not the intended recipient, don't use or 
> disclose it in any way. Please let the sender know and delete the message 
> immediately.
>
> -
>
> -
> ATTENTION:
> The information in this e-mail is confidential and only meant for the 
> intended recipient. If you are not the intended recipient, don't use or 
> disclose it in any way. Please let the sender know and delete the message 
> immediately.
> -
>
>


Re: Nifi 1.1.14 - too many open files after a while

2020-12-15 Thread Joe Witt
I put dots in there...its just 261 or later than 271.

if you have oracles i think latest is 271 and you should downgrade.

thanks

On Tue, Dec 15, 2020 at 6:41 AM Joe Witt  wrote:

> Hello
>
> I recommend downgrading the jvm to 2.6.1 or upgrading to after 2.7.1.
>
> It is a known issue with that jvm.
>
> thanks
>
> On Tue, Dec 15, 2020 at 6:35 AM Marel, J. van der (Jasper) <
> jasper.van.der.ma...@ing.com> wrote:
>
>> Hi,
>>
>>
>>
>> I am using this java version :
>>
>>
>>
>> java -version
>>
>> java version "1.8.0_271"
>>
>> Java(TM) SE Runtime Environment (build 1.8.0_271-b25)
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 25.271-b25, mixed mode)
>>
>>
>>
>> *From:* Joe Witt 
>> *Sent:* dinsdag 15 december 2020 14:34
>> *To:* users@nifi.apache.org
>> *Subject:* Re: Nifi 1.1.14 - too many open files after a while
>>
>>
>>
>> No known leak.
>>
>>
>>
>> Which JVM are you using?
>>
>>
>>
>> On Tue, Dec 15, 2020 at 6:18 AM Marel, J. van der (Jasper) <
>> jasper.van.der.ma...@ing.com> wrote:
>>
>> Hi,
>>
>>
>>
>> I have installed nifi 1.1.14 on a CentOS7 machine. Ulimits are set to
>> 5 open files for the user.
>>
>> However after a while a lot of open files are reported, mostly related to
>> an increasing number of sockets. The Apache nifi canvas is almost blank.
>>
>> Is this a know issue ?
>>
>>
>>
>> Version : nifi_3_5_1_1_3-1.11.4.3.5.1.1-3.x86_64 (HortonWorks branded)
>>
>> OS : Redhat 7.9
>>
>>
>>
>> [/proc/27306/fd] ls -la | wc -l | 3742 open files
>>
>> [/proc/27306/fd] socket | wc -l | 835 open sockets ( and increasing over
>> time)
>>
>>
>>
>> Met vriendelijke groet / Kind regards,
>>
>>
>> *Jasper van der Marel*
>>
>>
>>
>> -
>>
>> ATTENTION:
>>
>> The information in this e-mail is confidential and only meant for the 
>> intended recipient. If you are not the intended recipient, don't use or 
>> disclose it in any way. Please let the sender know and delete the message 
>> immediately.
>>
>> -
>>
>> -
>> ATTENTION:
>> The information in this e-mail is confidential and only meant for the 
>> intended recipient. If you are not the intended recipient, don't use or 
>> disclose it in any way. Please let the sender know and delete the message 
>> immediately.
>> -
>>
>>


Re: Nifi 1.12.1 cluster is getting hung after few days(15 days)

2021-01-07 Thread Joe Witt
Hello

Please capture and share a full thread dump by running bin/nifi.sh dump.
and please post these so theyre easier to read than this email system.

Thanks

On Thu, Jan 7, 2021 at 5:22 AM sanjeet rath  wrote:

> Hi All,
>
> Could someone please give me thoughts on the trailed mail issue, so i can
> do my further analysis.
>
> Regards,
> Sanjeet
>
> On Wed, 6 Jan 2021, 7:40 pm sanjeet rath,  wrote:
>
>> Hi All,
>>
>> Happy New Year :)
>>
>> I have upgraded our cluster from 1.8 to 1.12.1, few days ago and everything
>> is working fine. I observed that Nifi was like hanged after running for few
>> days (I have observed its nearly after 15 days of nifi service start) issue
>> is after login the browser keep on loading , When I saw the bootstrap.log I
>> saw this message "*Apache nifi is running at PID () but not responding
>> to ping requests*”.
>> This happened to only one node from a 3 node cluster.
>>
>> This issue happened *3 times on different cluster on different nodes.*
>>
>> *Everytime issue got fixed by restarting NiFi service.*
>>
>> During  the hanged state I tried see the resource utilisation
>>
>>  -> top -n 1 -H -p 943785 (nifi processid )
>>
>>
>> top - 08:26:36 up 40 days, 3:48, 2 users, load average: 5.28, 5.38, 5.43
>> Threads: 239 total, 4 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s):
>> 98.7 us, 1.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem :
>> 15829.5 total, 610.8 free, 10823.7 used, 4395.0 buff/cache MiB Swap: 0.0
>> total, 0.0 free, 0.0 used. 4456.1 avail Mem
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>
>> *943806* root 20 0 12.5g 9.4g 18692 R *88.9* 60.7 12698:50 *GC Thread#1 *
>>
>> 943807 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:48 GC Thread#2
>>
>> 943808 root 20 0 12.5g 9.4g 18692 R 88.9 60.7 12698:58 GC Thread#3
>>
>>  943787 root 20 0 12.5g 9.4g 18692 R 83.3 60.7 12698:51 GC Thread#0
>>
>> 943785 root 20 0 12.5g 9.4g 18692 S 0.0 60.7 0:00.00 java
>>
>>
>> We have 4 core cpu, all *4 GC threads*  are keep on this state and
>> consuming more CPU.*cluster is hung state for 2 days,* Then after 2 days
>> I saw these threads are moved and nifi comes out of the hung state for this
>> node , but saw another node from the same cluster moved to the hung state
>> with similar fashion means , 4 threads busy in GC and consuming more CPU.
>>
>>
>> Could you please help me to identify what could be the possible reason.
>>
>> Details:
>>
>> Nifi 1.12.1
>>
>> Jdk 11
>>
>> Zookeeper 3.5.8
>>
>> 16g memory
>>
>>
>>
>> Thanks,
>> --
>> Sanjeet Kumar Rath,
>> mob- +91 8777577470
>>
>>
>>


Re: NiFi Cluster 1.9.2 Content Repository

2021-01-19 Thread Joe Witt
Hello

The key value to watch is the 50% value. That means we will work to remove
content no longer reachable in the flow until we are at 50% of the
available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is
happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland 
wrote:

> Hello all,
>
>
>
> We have  a 3 node NiFi 1.9.2 cluster for which the content repository
> config is below. I don’t have the entire history of this install, however
> we are only able to retrieve the content from the flows that ran within the
> past couple of minutes.
>
> All others when trying to view NiFi Data Provenance -> Content  will show
>
> *Replay*
>
> Content is no longer available in Content Repository
>
>
>
> Checking all 3 nodes at intervals, the content repository size on disk is:
>
> Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the
> coordinator
>
> Node 2:  5.9G, static
>
> Node 3: 201M, static
>
>
>
> Is there a default size for the content repository set around 4GB? Looking
> at the documentation, I can’t seem to find the answer to that question.
>
>
>
> # Content Repository
>
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
>
> nifi.content.claim.max.appendable.size=1 MB
>
> nifi.content.claim.max.flow.files=100
>
> nifi.content.repository.directory.default=/u12/nifi/data/content_repository
>
> nifi.content.repository.archive.max.retention.period=12 hours
>
> nifi.content.repository.archive.max.usage.percentage=50%
>
> nifi.content.repository.archive.enabled=true
>
> nifi.content.repository.always.sync=false
>
> nifi.content.viewer.url=../nifi-content-viewer/
>
>
>
> Thank you for your help,
>
> *Roland*
>
>
> This message (including any attachments) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is non-public, proprietary, privileged, confidential, and
> exempt from disclosure under applicable law or may constitute as attorney
> work product. If you are not the intended recipient, you are hereby
> notified that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and (i) destroy
> this message if a facsimile or (ii) delete this message immediately if this
> is an electronic communication. Thank you.
>


Re: [E] Re: NIFI show version changed *, but version control show no local changes

2021-01-27 Thread Joe Witt
There is no requirement to use the registry.  It simply gives you a way to
store versioned flows which you can reference/use from zero or more nifi
clusters/flows to help keep things in line.  Many teams use this to ensure
as flows are improved over time and worked through dev/test/stage/prod
environments that they graduate properly.

Thanks

On Wed, Jan 27, 2021 at 8:31 AM Maksym Skrynnikov <
skrynnikov.mak...@verizonmedia.com> wrote:

> We use NiFi of version 1.12.1 but we do not use NiFi Registry, I wonder
> if that's the requirement to use the registry?
>
> On Wed, Jan 27, 2021 at 2:25 PM Bryan Bende  wrote:
>
>> Please specify the versions of NiFi and NiFi Registry. If it is not
>> the latest (1.12.1 and 0.8.0), then it would be good to try with the
>> latest since there have been significant improvements around this area
>> in the last few releases.
>>
>> On Wed, Jan 27, 2021 at 5:45 AM Jens M. Kofoed 
>> wrote:
>> >
>> > Hi
>> >
>> > We have a situation where process groups in NIFI shows they are not up
>> to date in version control. The show a *. But going to version control to
>> see local changes, there are none.
>> > NIFI reports back, there are no local changes. Submitting a new
>> version, makes no different. A new version is created, but NIFI still shows
>> the * and not the green check mark.
>> >
>> > I have tried to restart Registry which doesn't help.
>> >
>> > Restarting NIFI help for a short while. After restaring NIFI the
>> process group show the green check mark and another group which is under
>> the same version control now shows it needed an update. After updating the
>> 2nd process group to the new version this process group now shows the * and
>> not the green check mark. Going to version control to see local changes,
>> there are none.
>> >
>> > Anybody who have experience with this issue?
>> >
>> > bug repport created:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_NIFIREG-2D437-3F&d=DwIFaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=nRtn9-9qg4PKzRb3YqAHXrLTXJYN1G0ZisUsm-XYLkObBvdpApuffGYoI9OPgBKm&m=Z9hTZ0OdCBCst-23EzCV6YNkdOQs--8BkHDlBqQlU2k&s=YVW9lyT5J-D2oUEeIGACI2vGYBHemlqdwupU_Q_5HuU&e=
>> >
>> > kind regards
>> >
>> > Jens M. Kofoed
>>
>


Re: Lagging worker nodes

2021-01-28 Thread Joe Witt
I'm assuming also this is the same thing Maksym was asking about
yesterday.  Let's try to keep the thread together as this gets discussed.

On Thu, Jan 28, 2021 at 1:10 PM Pierre Villard 
wrote:

> Hi Zilvinas,
>
> I'm afraid we would need more details to help you out here.
>
> My first question by quickly looking at the graph would be: there is a
> host (green line) where the number of queued flow files is more or less
> constantly growing. Where in the flow are the flow files accumulating for
> this node? What processor is creating back pressure? Do we have anything in
> the log for this node around the time where flow files start accumulating?
>
> Thanks,
> Pierre
>
> Le ven. 29 janv. 2021 à 00:02, Zilvinas Saltys <
> zilvinas.sal...@verizonmedia.com> a écrit :
>
>> Hi,
>>
>> We run a 25 node Nifi cluster on version 1.12. We're processing about
>> 2000 files per 5 mins where each file is from 100 to 500 megabytes.
>>
>> What I notice is that some workers degrade in performance and keep
>> accumulating a queued files delay. See attached screenshots where it shows
>> two hosts where one is degraded.
>>
>> One seemingly dead give away is that the degraded node starts doing heavy
>> and intensive disk read io while the other node keeps doing none. I ran
>> iostat on those nodes and I know that the read IOs are on the
>> content_repository directory. But it makes no sense to me how some of the
>> nodes who are doing these heavy tasks are doing no disk read io. In this
>> example I know that both nodes are processing roughly the same amount of
>> files and of same size.
>>
>> The pipeline is somewhat simple:
>> 1) Read from SQS 2) Fetch file contents from S3 3) Publish file contents
>> to Kafka 4) Compress file contents 5) Put compressed contents back to S3
>>
>> All of these operations to my understanding should require heavy reads
>> from local disk to fetch file contents from content repository? How is such
>> a thing possible that some nodes are processing lots of files and are not
>> showing any disk reads and then suddenly spike in disk reads and degrade?
>>
>> Any clues would be really helpful.
>> Thanks.
>>
>


Re: NiFi 1.12.1 queue balancing

2021-01-28 Thread Joe Witt
Maksym,

Very difficult to look at these brief/limited details and offer meaningful
responses.  In the example you show about the data volumes are so small
that I dont even know that load balancing would kick in.  But yes generally
speaking the combination of load balancing and back pressure controls can
yield extremely well balanced flows even when you have
heterogeneous infrastructure.

Thanks

On Thu, Jan 28, 2021 at 4:45 AM Maksym Skrynnikov <
skrynnikov.mak...@verizonmedia.com> wrote:

> I am running NiFi 1.12.1 without NiFi Registry and have a connection that
> is configured to *Round Robin *flow files. After some time I see some
> nodes performing worse than the other and the queue is piling up on 1-2
> nodes.
> [image: niifi-queue.jpg]
>
> The question I have is how rebalancing actually work? Would it ideally try
> to rebalance what's already on the node? Or nodes that are
> underperforming would always end up with the largest queue? I expected it
> to rebalance the load so each node would get its portion of the queue.
>
> Thank you
>


Re: [E] Re: Lagging worker nodes

2021-01-28 Thread Joe Witt
Saltys

It can be possible because those things can still be cached.  The way this
thing really works at scale can be quite awesome actually.

However, definitely want to help you understand what is happening but the
pictures alone dont cut it.  We appreciate you have sensitivities/stuff you
have to remove.  But that is also a major factor in being able to help.

We need details on how processors are configured.

Thanks

On Thu, Jan 28, 2021 at 1:45 PM Zilvinas Saltys <
zilvinas.sal...@verizonmedia.com> wrote:

> We're still on an old version of Kafka that's why we're still using old
> processors.
>
> File sizes vary .. Generally they are all within +-100mb range before they
> are uncompressed. There can be some small files but they are not a
> majority. From logging I can see that all hosts are processing files of all
> sizes.
>
> Our SQS processor runs on all nodes and takes 1 message only. We force
> initial balancing this way.
>
> Any idea how a node can publish a 400 mb file to Kafka and not show any
> DISK read IO at the same time? How could something like that be possible?
> Is there any way where Nifi would not read the file out of the
> local content repo but have it cached? Or could this be just the kernel
> caching the entire content repo device?
>
> Thanks
>
> On Thu, Jan 28, 2021 at 8:39 PM Pierre Villard <
> pierre.villard...@gmail.com> wrote:
>
>> Not saying this is the issue, but is your Kafka cluster using Kafka 0.11?
>> Looking at the screenshot, you're using the Kafka processors from the 0.11
>> bundle, you might want to look at the processors for Kafka 2.x instead.
>>
>> Are your files more or less evenly distributed in terms of sizes?
>> I suppose your SQS processor is running on the primary node only? What
>> node is that in the previous screenshot?
>>
>> Pierre
>>
>> Le ven. 29 janv. 2021 à 00:28, Zilvinas Saltys <
>> zilvinas.sal...@verizonmedia.com> a écrit :
>>
>>> My other issue is that the balancing is not rebalancing the queue?
>>> Perhaps I misunderstand how balancing should work and it only balances
>>> round robin new incoming files? I can easily manually rebalance by
>>> disabling balancing and enabling it again but after a while it gets back to
>>> the same situation where some nodes are getting worse and worse delayed
>>> more and more and some remain fine.
>>>
>>> On Thu, Jan 28, 2021 at 8:22 PM Zilvinas Saltys <
>>> zilvinas.sal...@verizonmedia.com> wrote:
>>>
>>>> Hi Joe,
>>>>
>>>> Yes it is the same issue. We have used your advice and reduced the
>>>> amount of threads on our large processors: fetch/compress/publish to a
>>>> minimum and then increased gradually to 4 until the processing rate became
>>>> acceptable (about 2000 files per 5 min). This is a cluster of 25 nodes of
>>>> 36 cores each.
>>>>
>>>> On Thu, Jan 28, 2021 at 8:19 PM Joe Witt  wrote:
>>>>
>>>>> I'm assuming also this is the same thing Maksym was asking about
>>>>> yesterday.  Let's try to keep the thread together as this gets discussed.
>>>>>
>>>>> On Thu, Jan 28, 2021 at 1:10 PM Pierre Villard <
>>>>> pierre.villard...@gmail.com> wrote:
>>>>>
>>>>>> Hi Zilvinas,
>>>>>>
>>>>>> I'm afraid we would need more details to help you out here.
>>>>>>
>>>>>> My first question by quickly looking at the graph would be: there is
>>>>>> a host (green line) where the number of queued flow files is more or less
>>>>>> constantly growing. Where in the flow are the flow files accumulating for
>>>>>> this node? What processor is creating back pressure? Do we have anything 
>>>>>> in
>>>>>> the log for this node around the time where flow files start 
>>>>>> accumulating?
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>
>>>>>> Le ven. 29 janv. 2021 à 00:02, Zilvinas Saltys <
>>>>>> zilvinas.sal...@verizonmedia.com> a écrit :
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We run a 25 node Nifi cluster on version 1.12. We're processing
>>>>>>> about 2000 files per 5 mins where each file is from 100 to 500 
>>>>>>> megabytes.
>>>>>>>
>>>>>>> What I notice is

Re: InvokeHTTP hangs after several successful calls

2021-01-28 Thread Joe Witt
The likely relevant thread is here

"Timer-Driven Process Thread-9" #70 prio=5 os_prio=31 cpu=12025.30ms
elapsed=4403.88s tid=0x7fe44f16b000 nid=0xe103 in Object.wait()
 [0x7fed4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.5/Native Method)
- waiting on 
at java.lang.Object.wait(java.base@11.0.5/Object.java:328)
at okhttp3.internal.http2.Http2Stream.waitForIo(Http2Stream.java:577)
at
okhttp3.internal.http2.Http2Stream.takeResponseHeaders(Http2Stream.java:143)
- waiting to re-lock in wait() <0x0007e007d818> (a
okhttp3.internal.http2.Http2Stream)
at
okhttp3.internal.http2.Http2Codec.readResponseHeaders(Http2Codec.java:120)
at
okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at
okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at
okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
at okhttp3.RealCall.execute(RealCall.java:69)
at
org.apache.nifi.processors.standard.InvokeHTTP.onTrigger(InvokeHTTP.java:793)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.5
/Executors.java:515)
at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.5
/FutureTask.java:305)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.5
/ScheduledThreadPoolExecutor.java:305)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.5
/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.5
/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

This implies that it is either genuinely waiting for IO that the remote
server is failing to send or has gotten itself into a state it could never
break out of.  Can you please use this exact same flow on the latest NiFi
release 1.12.1 to see if the issue remains?

Thanks

On Thu, Jan 28, 2021 at 2:43 PM Vijay Chhipa  wrote:

> Hi Joe,
>
> Thanks for looking into this issue.
> We are on NiFi 1.10.0
>
> Please see the attached
> 1. thread dump file
> 2. nifi-app.log
> 3. nifi-bootstrap.log
> 4. nifi.properties
>
> The InvokeHTTP configuration is as shown below.
>
>
>
>
>
> On Jan 25, 2021, at 7:24 AM, Joe Witt  wrote:
>
> Hello
>
> If you suspect an actual stuck/hung thread then the best course of action
> is to generate a thread dump.  This can be done in a couple ways but one of
> the easiest is to run 'bin/nifi.sh dump'.  We would need to see the
> nifi-app.log(s) and nifi-bootstrap.log(s).  In addition the report should
> include the specific nifi version being used specifics of the processor
> configuration.
>
> Thanks
>
> On Mon, Jan 25, 2021 at 6:12 AM Jairo Henao 
> wrote:
>
>> Hello,
>>
>> In my case something similar happens. I invoke a REST-JSON service that
>> sometimes takes more than 10 minutes to respond (I have set the timeout to
>> 15 minutes) and after the second or third call the service responds (I
>> could check it on the server side) but the processor remains waiting for
>> the response. I try to stop it and after "terminate" it but the thread
>> seems to stay active, so I delete it, (disconnect it and remove it from the
>> flow) and add a new InvokeHttp and it works again.
>>
>> Given the difficulty of finding steps to reproduce it, I have not
>> reported it. I hope someone can help us.
>>
>> On Mon, Jan 25,

Re: Monitoring NiFi Root Level Controller Services

2021-02-01 Thread Joe Witt
John,

You're using a vendor distribution of NiFi.  You should contact the vendor.

You can certainly monitor the state of a controller service via the REST
API.  They should either be enabled or not enabled.

Thanks

On Mon, Feb 1, 2021 at 5:39 PM jgunvaldson  wrote:

> Hi All,
>
> Root level (canvas) Controller Services - We tend to setup several Root
> level Controller services for developers that are typically
> DBCPConnectionPools and DistributedMapCacheClientService and maybe a few
> other. MOST importantly, these controller services cannot be down and
> cannot be disabled - must remain Enabled at all times.
>
> We have now had a few outages where upon examination a Controller Service
> has encountered “something” that caused it to be Disabled or Down.
>
> Is there a standard practice we can use to “Monitor” the controller
> services and ensure we get alerted if any one of them goes into a Disable
> state?
>
> What do folks generally think is a good monitoring practice?
>
> We are on
>
> HDF Version 3.4.1.1.
>
> Powered by Apache NiFi Version 1.9.0
> 1.9.0.3.4.1.1-4 built 05/01/2019 02:15:30 UTC
> Tagged nifi-1.9.0-RC2
> From 7410fa4 on branch UNKNOWN
>
>
>


Re: Monitoring NiFi Root Level Controller Services

2021-02-01 Thread Joe Witt
also sorry for referencing you as 'John' - not sure why I just assumed that
:)

On Mon, Feb 1, 2021 at 5:46 PM Joe Witt  wrote:

> John,
>
> You're using a vendor distribution of NiFi.  You should contact the vendor.
>
> You can certainly monitor the state of a controller service via the REST
> API.  They should either be enabled or not enabled.
>
> Thanks
>
> On Mon, Feb 1, 2021 at 5:39 PM jgunvaldson  wrote:
>
>> Hi All,
>>
>> Root level (canvas) Controller Services - We tend to setup several Root
>> level Controller services for developers that are typically
>> DBCPConnectionPools and DistributedMapCacheClientService and maybe a few
>> other. MOST importantly, these controller services cannot be down and
>> cannot be disabled - must remain Enabled at all times.
>>
>> We have now had a few outages where upon examination a Controller Service
>> has encountered “something” that caused it to be Disabled or Down.
>>
>> Is there a standard practice we can use to “Monitor” the controller
>> services and ensure we get alerted if any one of them goes into a Disable
>> state?
>>
>> What do folks generally think is a good monitoring practice?
>>
>> We are on
>>
>> HDF Version 3.4.1.1.
>>
>> Powered by Apache NiFi Version 1.9.0
>> 1.9.0.3.4.1.1-4 built 05/01/2019 02:15:30 UTC
>> Tagged nifi-1.9.0-RC2
>> From 7410fa4 on branch UNKNOWN
>>
>>
>>


Re: After upgrade to 1.11.4, flowController fails to start due to invalid clusterCoordinator port 0

2021-02-08 Thread Joe Witt
PatW

I'd triple-check to ensure there are no weird/special/unexpected characters
in your nifi.properties file.  These are often not obvious in default text
views so you might need to explicitly set some view to expose them.

Yeah this is certainly not a great user experience - we give you just
enough to have an idea but leave plenty to the imagination here.

I suppose the good news is we know it is a port.

Check lines in/around
nifi.remote.input.socket.port=
nifi.web.http.port=8080
nifi.web.https.port=
nifi.cluster.node.protocol.port=
nifi.cluster.load.balance.port=6342

Thanks

On Mon, Feb 8, 2021 at 2:47 PM Pat White  wrote:

> Hi Folks,
>
> Appreciate any debugging help on a very odd error, after upgrading a Nifi
> cluster from 1.6.0 to 1.11.4, flowController fails to start due to:
>
> Caused by: org.springframework.beans.factory.BeanCreationException: Error
> creating bean with name 'flowService': FactoryBean threw exception on
> object creation; nested exception is
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name 'flowController' defined in class path resource
> [nifi-context.xml]: Cannot resolve reference to bean 'clusterCoordinator'
> while setting bean property 'clusterCoordinator'; nested exception is
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name 'clusterCoordinator': FactoryBean threw exception on object
> creation; nested exception is java.lang.IllegalArgumentException: Port must
> be inclusively in the range [1, 65535].  Port given: 0
>
>
> The error trace is very similar to the example Andy described in
> NIFI-6336, the issue there i believe is not specifying
> 'nifi.cluster.node.protocol.port' in 'nifi.properties', however my conf has
> that set, 'nifi.cluster.node.protocol.port=50233'  and should be using
> 50233 instead of '0'.
>
> Cluster had been running fine previously and as far as i can tell, Nifi
> and ZK confs and settings are all ok. Also compared to another cluster that
> had been upgraded with no issues, and is running 1.11.4 just fine.
>
> Increased debug logging but without success so far. Am looking at the
> right property association?
>
> patw
>
>
>


[ANNOUNCE] Apache NiFi 1.13.0 release

2021-02-16 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi
1.13.0.

This release includes 260 new features, bug fixes and improvements.

Apache NiFi is an easy to use, powerful, and reliable system to process and
distribute
data.  Apache NiFi was made for dataflow.  It supports highly configurable
directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal ASF
artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12348700

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.0

Thank you
The Apache NiFi team


[ANNOUNCE] Apache NiFi 1.13.0 release

2021-02-16 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.13.0.

This release includes 260 new features, bug fixes and improvements.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12348700

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.0

Thank you
The Apache NiFi team


Re: NIFI - Topic pattern names

2021-02-23 Thread Joe Witt
See 
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-2-6-nar/1.13.0/org.apache.nifi.processors.kafka.pubsub.ConsumeKafka_2_6/index.html
Look at 'Topic Name(s)' and 'Topic Name Format' with a provided naming pattern.

Thanks

On Tue, Feb 23, 2021 at 1:55 PM KhajaAsmath Mohammed
 wrote:
>
> Hi,
>
> I am planning to consume multiple topics from kafka that status with a 
> particular word. May I know how to use this?
>
> I dont want to enter all the names.
>
> Thanks,
> Asmath


Re: Questions about the GetFile processor

2021-02-26 Thread Joe Witt
Hello

Yeah when there are a ton (50k or more) of files in a directory performance
is *horrible*.   If you can put them into some subdirs to divide it up then
it will go a lot faster.

Thanks

On Fri, Feb 26, 2021 at 7:30 PM Jean-Sebastien Vachon <
jsvac...@brizodata.com> wrote:

> Hi again,
>
> I need to reprocess all my files after we discovered a problem. My folder
> contains 3,906,135 JSON files (590GB total size).
> I tried the ListFile strategy, and it works fine on a small subset but on
> the whole dataset not a single flow was queued after many hours of waiting.
>
> Is it normal that it takes so long to do something?
>
> I am using the following settings:
>
>   Tracking Timestamps,
>   no recurse,
>   file filter is set to the default ([^\.].*),
>   the minimal size is 0b and the min age is 0s,
>   track performance is off,
>   max number of files is set to 5,000,000
>   max disk op time is 10 s
>   max directory listing time is 3 hours
>
> Am I doing something wrong? my server is quite capable with 512GB of Ram
> and 128 cores.
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Jean-Sebastien Vachon 
> *Sent:* Thursday, February 18, 2021 8:59 AM
>
> *To:* users@nifi.apache.org 
> *Subject:* Re: Questions about the GetFile processor
>
> OK thanks
>
> I missed that part of the documentation. Stupid me
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Arpad Boda 
> *Sent:* Thursday, February 18, 2021 8:46 AM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Questions about the GetFile processor
>
> GetFile has no persistence.
> Actually it has, but it's called your hard drive. :)
>
> If you take a look at the documentation:
> *Keep Source File - *"If true, the file is not deleted after it has been
> copied to the Content Repository; this causes the file to be picked up
> continually and is useful for testing purposes. If not keeping original
> NiFi will need write permissions on the directory it is pulling from
> otherwise it will ignore the file."
>
> You can see that it's going to get the same files over and over again
> unless you configure it to delete the already processed ones.
>
> The reason I suggested the combination above is that listfile can be
> triggered once, the metadata (filenames) are stored in your queue and
> fetchfile can process them later.
>
> On Thu, Feb 18, 2021 at 2:39 PM Jean-Sebastien Vachon <
> jsvac...@brizodata.com> wrote:
>
> OK I understand your point.. sorry (early morning) 😉
>
> I am kind of stuck with the GetFile processor for now. Is there a way to
> know how many files are left to process?
>
> Will it go forever? or will it stops streaming once all files have been
> processed? (there are no new files in the folder... everything was there at
> the beginning)
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Jean-Sebastien Vachon 
> *Sent:* Thursday, February 18, 2021 8:34 AM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Questions about the GetFile processor
>
> Thanks for your comment. However, I can't queue everything as the total
> size of the data is around 560GB.
> Right now, I am using a GetFile processor and it has been running for a
> few days. If I look at my end point, it looks like it should be done pretty
> soon but data is still
> streaming in at the same rate so I was wondering if the processor
> remembers every single file it has already processed or if it is simply
> going through all the files alphabetically or in whatever order it decides.
>
> Thanks
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Arpad Boda 
> *Sent:* Thursday, February 18, 2021 8:29 AM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Questions about the GetFile processor
>
> You can use the combination of listfile and fetchfile.
> In the queue between the two you are going to see the number of
> (flow)files left to be processed.
>
> On Thu, Feb 18, 2021 at 2:14 PM Jean-Sebastien Vachon <
> jsvac...@brizodata.com> wrote:
>
> Hi all,
>
> If I configure a GetFile processor to list all JSON files under a given
> folder, will it stops sending flows once it has processed all files?
> My folder contains thousands of files and the processor reads them by
> small batch (10) every 30s.
>
> Is there a way to know how many files are left to processed?
>
> Thanks
>

[ANNOUNCE] Apache NiFi 1.13.1 release

2021-03-16 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi
1.13.1.

This is a stability focused release including nearly 50 bug fixes and
improvements.

Apache NiFi is an easy to use, powerful, and reliable system to process and
distribute
data.  Apache NiFi was made for dataflow.  It supports highly configurable
directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal ASF
artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12348700

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1

Thank you
The Apache NiFi team


[ANNOUNCE] Apache NiFi 1.13.1 release

2021-03-16 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.13.1.

This is a stability focused release including nearly 50 bug fixes and
improvements.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12348700

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1

Thank you
The Apache NiFi team


Re: Any known issue on SplitRecord?

2021-03-17 Thread Joe Witt
Juan

We found a bug in 1.13.1 today as reported here
https://issues.apache.org/jira/browse/NIFI-8337 and
https://issues.apache.org/jira/browse/NIFI-8334.

We will have a 1.13.2 out asap to fix this and the regression now has tests
to prevent it in the future.

Thanks
Joe

On Wed, Mar 17, 2021 at 8:44 PM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> Hi all,
>
> I am using latest nifi version and SplitRecord works only once and then
> hangs:
>
> [image: image.png]
>
> I cannot stop it also.
>
> Juan
>


Regression in 1.13.1 - creating 1.13.2

2021-03-17 Thread Joe Witt
All



Based on two reports received today we've found a regression in
process session handling in NiFi 1.13.1.  This is fixed on main and we
will have a 1.13.2 RC1 up for vote in a couple hours.

The core issue found is
https://issues.apache.org/jira/browse/NIFI-8337.  This is the same
root problem as reported with
https://issues.apache.org/jira/browse/NIFI-8334

Thanks
Joe


Re: Any known issue on SplitRecord?

2021-03-17 Thread Joe Witt
I should clarify that I am not positive it is the same issue but it is
certainly possible especially if this worked in 1.13.0

Thanks

On Wed, Mar 17, 2021 at 8:46 PM Joe Witt  wrote:

> Juan
>
> We found a bug in 1.13.1 today as reported here
> https://issues.apache.org/jira/browse/NIFI-8337 and
> https://issues.apache.org/jira/browse/NIFI-8334.
>
> We will have a 1.13.2 out asap to fix this and the regression now has
> tests to prevent it in the future.
>
> Thanks
> Joe
>
> On Wed, Mar 17, 2021 at 8:44 PM Juan Pablo Gardella <
> gardellajuanpa...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am using latest nifi version and SplitRecord works only once and then
>> hangs:
>>
>> [image: image.png]
>>
>> I cannot stop it also.
>>
>> Juan
>>
>


Re: Any known issue on SplitRecord?

2021-03-17 Thread Joe Witt
Thanks Juan - that would be very valuable actually.  I'll send you a link
to a build here in an hour or so. If you can test that and let us know that
will help us with the release candidate voting process quite a bit.

Thanks

On Wed, Mar 17, 2021 at 8:49 PM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> Wow that is fast! You are awesome, thanks Joe. I will test it.
>
> Juan
>
> On Thu, 18 Mar 2021 at 00:47, Joe Witt  wrote:
>
>> Juan
>>
>> We found a bug in 1.13.1 today as reported here
>> https://issues.apache.org/jira/browse/NIFI-8337 and
>> https://issues.apache.org/jira/browse/NIFI-8334.
>>
>> We will have a 1.13.2 out asap to fix this and the regression now has
>> tests to prevent it in the future.
>>
>> Thanks
>> Joe
>>
>> On Wed, Mar 17, 2021 at 8:44 PM Juan Pablo Gardella <
>> gardellajuanpa...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am using latest nifi version and SplitRecord works only once and then
>>> hangs:
>>>
>>> [image: image.png]
>>>
>>> I cannot stop it also.
>>>
>>> Juan
>>>
>>


Re: SplitText issue

2021-03-19 Thread Joe Witt
Michal - It is on track to be available within about 3 hours.

On Fri, Mar 19, 2021 at 7:35 AM Michal Tomaszewski
 wrote:
>
> Thanks!
>
> Regards,
>
> Michal
>
>
>
> From: Mark Payne 
> Sent: Friday, March 19, 2021 3:01 PM
> To: users@nifi.apache.org
> Subject: Re: SplitText issue
>
>
>
> Michal,
>
>
>
> We are working on a 1.13.2 release currently that should address this.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> On Mar 19, 2021, at 9:31 AM, Michal Tomaszewski  
> wrote:
>
>
>
> Hi,
>
>
>
> In version 13.1 and 14 there is a problem with SplitText. In 1.13.0 and 1.10 
> component works as expected. So problem is in any of fixes between 1.13 and 
> 1.13.1,
>
>
>
> Test scenario:
>
> We take 1.2GB file from HDFS
> We divide it into rows/lines
> Due to performance limitations we use 4 splittexts one after another: divide 
> flowfile into 5M rows, after that into flowfiles containing 500k rows, 50k 
> rows and at the end 5k rows.
>
>
>
> We verified the same problem exists when using only one splittext component 
> and smalle input flows (e.g. ~5000 rows on input divided into flows having 
> 200 rows is also not working)
>
>
>
> At the end we expect flowfiles having 5k rows:
>
> 
>
>
>
>
>
> Result:
>
> currently flowfiles are divided into fragments (sometimes it is even 1,5row 
> instead of 5k rows) and it is even sometimes dividing in the middle of row:
>
>
>
> example #1 of output flow:
>
>
>
> 
>
>
>
> Example #2 of output flow:
>
> 
>
>
>
>
>
>
>
>
>
> Regards,
>
>Michał
>
>  Uwaga: Treść niniejszej wiadomości 
> może być poufna i objęta zakazem jej ujawniania. Jeśli czytelnik tej 
> wiadomości nie jest jej zamierzonym adresatem, pracownikiem lub pośrednikiem 
> upoważnionym do jej przekazania adresatowi, informujemy że wszelkie 
> rozprowadzanie, rozpowszechnianie lub powielanie niniejszej wiadomości jest 
> zabronione. Jeśli otrzymałeś tę wiadomość omyłkowo, proszę bezzwłocznie 
> odesłać ją nadawcy, a samą wiadomość usunąć z komputera. Dziękujemy. 
>  Note: The information contained in this 
> message may be privileged and confidential and protected from disclosure. If 
> the reader of this message is not the intended recipient, or an employee or 
> agent responsible for delivering this message to the intended recipient, you 
> are hereby notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited.If you have received this communication 
> in error, please notify the sender immediately by replying to the message and 
> deleting it from your computer. Thank you. 


Re: Writing parquet files to S3

2021-03-22 Thread Joe Witt
Not responding to the real question in the thread but "I'm using NIFI
1.13.1.".  Please switch to 1.13.2 right away due to a regression in
1.13.1


On Mon, Mar 22, 2021 at 12:24 AM Vibhath Ileperuma
 wrote:
>
> Hi Bryan,
>
> I'm planning to add these generated parquet files to an impala S3 table.
> I noticed that impala written parquet files contain only one row group. 
> That's why I'm trying to write one row group per file.
>
> However, I tried to create small parquet files (Snappy compressed) first and 
> use a MergeRecord Processor with a ParquetRecordSetWriter in which the row 
> group size is set to 256 MB to generate parquet files with one row group. The 
> configurations I used,
>
> Merge Strategy: Bin-Packing Algorithm
> Minimum Number of Records: 1
> Maximum Number of Records: 250   (2.5 million)
>  Minimum Bin Size : 230 MB
> Maximum Bin Size : 256 MB
> Max Bin Age: 20 minutes
>
> Note that, above mentioned small parquet files usually contain 200,000 
> records and size is about 21 MB- 22 MB. Hence about 12 files should be merged 
> to generate one file.
>
> But when I run the processor, it always merges 19 files and generates files 
> of size 415 MB - 417 MB.
>
> I'm using NIFI 1.13.1. Could you please let me know how to resolve this issue.
>
> Thanks & Regards
>
> Vibhath Ileperuma
>
>
>
>
>
> On Fri, Mar 19, 2021 at 8:45 PM Bryan Bende  wrote:
>>
>> Hello,
>>
>> What would the reason be to need only one row group per file? Parquet
>> files by design can have many row groups.
>>
>> The ParquetRecordSetWriter won't be able to do this since it is just
>> given an output stream to write all the records to, which happens to
>> be the outputstream for one flow file.
>>
>> -Bryan
>>
>> On Fri, Mar 19, 2021 at 10:31 AM Vibhath Ileperuma
>>  wrote:
>> >
>> > Hi all,
>> >
>> > I'm developing a NIFI flow to convert a set of csv data to parquet format 
>> > and upload them to a S3 bucket. I use a 'ConvertRecord' processor with a 
>> > csv reader and a parquet record set writer to convert data and use a 
>> > 'PutS3Object' to send it to S3 bucket.
>> >
>> > When converting, I need to make sure the parquet row group size is 256 MB 
>> > and each parquet file contains only one row group. Even Though it is 
>> > possible to set the row group size in ParquetRecordSetWriter, I couldn't 
>> > find a way to make sure each parquet file contains only one row group (If 
>> > a csv file contains data  more than required for a 256MB row group, 
>> > multiple parquet files should be generated).
>> >
>> > I would be grateful if you could suggest a way to do this.
>> >
>> > Thanks & Regards
>> >
>> > Vibhath Ileperuma
>> >
>> >
>> >


Re: Creating recursive missing folders with PutSmbFile

2021-03-22 Thread Joe Witt
pretty sure SMB is super popular - it is just that for the cases we
typically engage in SMB isn't used as the protocol to access data :)

Agree with the rest of that

Thanks

On Mon, Mar 22, 2021 at 11:13 AM Mark Payne  wrote:
>
> Jens,
>
> In order to review & merge a PR, there are two important things that need to 
> happen:
>
> 1. A NiFi committer must review the code to make sure that the changes are 
> safe, make sense, conducive with the architecture, is adhering to best 
> practices, doesn’t break automated tests, etc.
> 2. The code needs to be tested - typically this is accomplished both manually 
> and in an automated sense. Sometimes only manually, sometimes automated.
>
> For this case, we really would need someone other than the contributor who 
> put up the PR to test this manually to verify that it works. The problem is 
> that SMB isn’t really that popular, I don’t think. So we would need someone 
> who can verify that the changes work as desired. This doesn’t need to be a 
> committer.
>
> If you’re able to build that branch and verify the changes and then report 
> back any positive or negative findings, that can go a long way to help in the 
> review process.
>
> Thanks
> -Mark
>
> On Mar 22, 2021, at 4:03 AM, Jens M. Kofoed  wrote:
>
> Hi
>
> The following JIRA: https://issues.apache.org/jira/browse/NIFI-7863, was 
> created October 1, 2020 and the user Jaya has created a PR October 9, 2020. 
> but nothing have happens since.
>
> Are there someone in the community which is able to help implement a fix?
> We had looked forward to see the fix included in 1.13, but unfortunately it 
> is not.
>
> Kind regards
> Jens M. Kofoed
>
>
>
>


Re: ExtractText 1.13.1 only one regex evaluated on CSV file?

2021-03-25 Thread Joe Witt
Please avoid 1.13.1. Grab 1.13.2

On Wed, Mar 24, 2021 at 11:33 PM Hendrik Ruijter <
hendrik.ruij...@verisure.com> wrote:

> [Sending again since I cannot see this mail in the archive]
>
> Hello, ExtractText 1.12.1 evaluates the regex for each flowfile and adds
> attributes to each flowfile. However, ExtractText 1.13.1 appears to
> evaluate the regex for the first flowfile only adding the same attributes
> to all flowfiles.
>
>
>
> Use case is a CSV file with content,
>
>
>
> 39204375,1583254
>
> 39089067,1559876
>
> 39192276,1548472
>
> 38915030,1575858
>
> 38918361,1538764
>
> 39190165,1549728
>
> 39090656,1569006
>
> 39147201,1549234
>
> 39183889,1569924
>
> 39101853,1566678
>
>
>
> where 39204375,1583254 are added as attributes to all 10 flowfiles by
> ExtractText 1.13.1.
>
>
>
>
>
> SplitText, Line Split Count = 1
>
> ExtractText,
>
>
>
> Best Regards
>


Re:

2021-03-26 Thread Joe Witt
We would need to know which nice version, Java version, and so you’re on.
>From the jetty output this appears potentially quite old.  Please try on
1.13.2

Thanks

On Fri, Mar 26, 2021 at 5:15 AM Ralph Vercauteren  wrote:

> Hi All,
>
> I get the next error connecting to the nifi website:
> [image: image.png]
>
> And in the nifi-users.log I see this error. I tried to google it but I can
> find any solution, what could be theissue?
> 2021-03-26 06:36:10,595 ERROR [NiFi Web Server-114]
> o.a.nifi.web.api.config.ThrowableMapper An unexpected error has occurred:
> java.lang.NoClassDefFoundError:
> org/apache/nifi/authorization/StandardAuthorizableLookup$StandardConnectionAuthorizable.
> Returning Internal Server Error response.
> java.lang.NoClassDefFoundError:
> org/apache/nifi/authorization/StandardAuthorizableLookup$StandardConnectionAuthorizable
> at
> org.apache.nifi.authorization.StandardAuthorizableLookup.getConnection(StandardAuthorizableLookup.java:274)
> at
> org.apache.nifi.web.api.FlowFileQueueResource.lambda$createFlowFileListing$0(FlowFileQueueResource.java:335)
> at
> org.apache.nifi.web.StandardNiFiServiceFacade.authorizeAccess(StandardNiFiServiceFacade.java:371)
> at
> org.apache.nifi.web.StandardNiFiServiceFacade$$FastClassBySpringCGLIB$$358780e0.invoke()
> at
> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
> at
> org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:736)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
> at
> org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
> at
> org.apache.nifi.web.StandardNiFiServiceFacade$$EnhancerBySpringCGLIB$$cc01c7f0.authorizeAccess()
> at
> org.apache.nifi.web.api.ApplicationResource.withWriteLock(ApplicationResource.java:705)
> at
> org.apache.nifi.web.api.FlowFileQueueResource.createFlowFileListing(FlowFileQueueResource.java:331)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
> at
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104)
> at
> org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:268)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:316)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:298)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:268)
> at
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
> at
> org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256)
> at
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416)
> at
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.jav

Re:

2021-03-26 Thread Joe Witt
Thanks Ralph.  I'm not aware of any specific authorization classpath issues
in that release but from your stack trace it seems likely there was.  I do
recommend you upgrade and/or verify your settings.  You also should
consider updating to the latest available Java 8 build (unlikely that is
related to the issue but I recommend that for other reasons).

Thanks

On Fri, Mar 26, 2021 at 7:45 AM Ralph Vercauteren  wrote:

> We using nifi version 1.8.0
>
> java version "1.8.0_202"
> Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
> Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
>
> Thanks in advance.
>
> With regards,
> Mit freundlichem Gruß,
> Met vriendelijke groet,
>
> *Ralph Vercauteren*
> Technical Architect QAD Automation Solutions
> Mobile NL: +31 6 5397 7230
> r...@qad.com
>
> This e-mail may contain QAD proprietary information and should be treated
> as confidential.
>
>
> On Fri, Mar 26, 2021 at 3:26 PM Joe Witt  wrote:
>
>> We would need to know which nice version, Java version, and so you’re
>> on.  From the jetty output this appears potentially quite old.  Please try
>> on 1.13.2
>>
>> Thanks
>>
>> On Fri, Mar 26, 2021 at 5:15 AM Ralph Vercauteren  wrote:
>>
>>> Hi All,
>>>
>>> I get the next error connecting to the nifi website:
>>> [image: image.png]
>>>
>>> And in the nifi-users.log I see this error. I tried to google it but I
>>> can find any solution, what could be theissue?
>>> 2021-03-26 06:36:10,595 ERROR [NiFi Web Server-114]
>>> o.a.nifi.web.api.config.ThrowableMapper An unexpected error has occurred:
>>> java.lang.NoClassDefFoundError:
>>> org/apache/nifi/authorization/StandardAuthorizableLookup$StandardConnectionAuthorizable.
>>> Returning Internal Server Error response.
>>> java.lang.NoClassDefFoundError:
>>> org/apache/nifi/authorization/StandardAuthorizableLookup$StandardConnectionAuthorizable
>>> at
>>> org.apache.nifi.authorization.StandardAuthorizableLookup.getConnection(StandardAuthorizableLookup.java:274)
>>> at
>>> org.apache.nifi.web.api.FlowFileQueueResource.lambda$createFlowFileListing$0(FlowFileQueueResource.java:335)
>>> at
>>> org.apache.nifi.web.StandardNiFiServiceFacade.authorizeAccess(StandardNiFiServiceFacade.java:371)
>>> at
>>> org.apache.nifi.web.StandardNiFiServiceFacade$$FastClassBySpringCGLIB$$358780e0.invoke()
>>> at
>>> org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
>>> at
>>> org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:736)
>>> at
>>> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
>>> at
>>> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
>>> at
>>> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
>>> at
>>> org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:671)
>>> at
>>> org.apache.nifi.web.StandardNiFiServiceFacade$$EnhancerBySpringCGLIB$$cc01c7f0.authorizeAccess()
>>> at
>>> org.apache.nifi.web.api.ApplicationResource.withWriteLock(ApplicationResource.java:705)
>>> at
>>> org.apache.nifi.web.api.FlowFileQueueResource.createFlowFileListing(FlowFileQueueResource.java:331)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at
>>> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>>> at
>>> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>>> at
>>> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>>> at
>>> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceM

Re: NiFi 1.11.4 Custom Processor Development

2021-04-13 Thread Joe Witt
Hello

In moving to support maven 3.8 we found only https based repos are allowed
by default.  In addressing this we updated the build to move away from
bintray and another location as well.

This is now the case when you build master.  You could try to cherry pick
and edit the key commits or hang tight for us to kick out a 1.14 release.
No time table for that yet though.

Thanks

On Tue, Apr 13, 2021 at 7:45 AM  wrote:

> Hi There,
>
> I am in the process of upgrading the version of the nifi-nar-bundle from
> 1.9.2 to 1.12.1 however I’ve hit a unexpected issue.
>
> We develop on an offline platform, with Maven Central Mirrored using a
> Nexus Repository Manager, with all dependencies delivered through this
> mechanism. Today when I have changed the version from 1.9.2 to 1.12.1 I
> have found that the parent (I assume for the groovy processor) has a
> dependency on groovy-eclipse-batch version 2.5.4-01 which seems to be only
> available from a Groovy Repository (https://dl.bintray.com/groovy/maven/)
> and not maven central.
>
> I also noticed that bintray will be closing shortly  (
> https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/),
> so I guess the dependency will need correcting before then?
>
> Kind Regards,
>
> Nathan
>
> Nathan English
> Applications Specialist - Cyber Delivery & DevOps
>


Re: NiFi 1.11.4 Custom Processor Development

2021-04-13 Thread Joe Witt
Nathan

Nothing has occurred that should result in issues with using custom
processors.  The challenge comes in for those trying to build on older
lines as bintray ages out and as tools like Maven (rightfully) push to
https only based repositories.

Thanks

On Tue, Apr 13, 2021 at 8:04 AM Russell Bateman  wrote:
>
> There shouldn't be any problem. I have many custom processors in a NAR I 
> haven't rebuilt since 1.12 and we use them successfully. They're just missing 
> versions because we had not yet upgraded the Maven we used to build the NAR 
> (not exactly relevant, I know, but I thought I'd I would point this out).
>
>
> On 4/13/21 8:55 AM, nathan.engl...@bt.com wrote:
>
> Hi Joe,
>
> Thanks for that information.
>
> Will there be any issue in running a Custom NAR file built against 1.9.2 on a 
> different versioned cluster. We haven’t experienced anything yet, but would 
> hate to spend ages in debugging an issue that is the result of this!
>
> Kind Regards,
>
> Nathan
> 
> From: Joe Witt 
> Sent: Tuesday, April 13, 2021 5:48 pm
> To: users@nifi.apache.org
> Subject: Re: NiFi 1.11.4 Custom Processor Development
>
> Hello
>
> In moving to support maven 3.8 we found only https based repos are allowed by 
> default.  In addressing this we updated the build to move away from bintray 
> and another location as well.
>
> This is now the case when you build master.  You could try to cherry pick and 
> edit the key commits or hang tight for us to kick out a 1.14 release.  No 
> time table for that yet though.
>
> Thanks
>
> On Tue, Apr 13, 2021 at 7:45 AM  wrote:
>>
>> Hi There,
>>
>> I am in the process of upgrading the version of the nifi-nar-bundle from 
>> 1.9.2 to 1.12.1 however I’ve hit a unexpected issue.
>>
>> We develop on an offline platform, with Maven Central Mirrored using a Nexus 
>> Repository Manager, with all dependencies delivered through this mechanism. 
>> Today when I have changed the version from 1.9.2 to 1.12.1 I have found that 
>> the parent (I assume for the groovy processor) has a dependency on 
>> groovy-eclipse-batch version 2.5.4-01 which seems to be only available from 
>> a Groovy Repository (https://dl.bintray.com/groovy/maven/) and not maven 
>> central.
>>
>> I also noticed that bintray will be closing shortly  
>> (https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/),
>>  so I guess the dependency will need correcting before then?
>>
>> Kind Regards,
>>
>> Nathan
>>
>> Nathan English
>> Applications Specialist - Cyber Delivery & DevOps
>
>


Re: Nifi throws an error when reading a large csv file

2021-04-14 Thread Joe Witt
How large is each line expected to be?  You could have a massive line
or much larger than thought of.  Or you could be creating far more
flowfiles than intended.  If you cut the file in size does it work
better?  Will need more data to help narrow in but obviously we're all
very interested to know what is happening.  These processors and the
readers/writers are meant to be quite bullet proof and handle very
very large data easily in most cases.

On Wed, Apr 14, 2021 at 10:07 AM Vibhath Ileperuma
 wrote:
>
> Hi Chris,
>
> As you have mentioned, I am trying to split the large csv file in multiple 
> stages. But this error is thrown at the first stage even without creating a 
> single flow file.
> It seems like the issue is not with the processor, but with the CSV record 
> reader. This error is thrown while reading the csv file. I tried to write the 
> data in the large csv file into a kudu table using a putKudu processor with 
> the same CSV reader. Then also I got the same error message.
>
> Hi Otto,
>
> Only following information is available in log file related to the exception
>
> 2021-04-14 17:48:28,628 ERROR [Timer-Driven Process Thread-1] 
> o.a.nifi.processors.standard.SplitRecord 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] failed to process 
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM 
> limit; Processor Administratively Yielded for 1 sec: 
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> 2021-04-14 17:48:28,628 WARN [Timer-Driven Process Thread-1] 
> o.a.n.controller.tasks.ConnectableTask Administratively Yielding 
> SplitRecord[id=c9a981db-0178-1000-363d-c767653a6f34] due to uncaught 
> Exception: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>
> Thanks & Regards
>
> Vibhath Ileperuma
>
>
>
>
> On Wed, Apr 14, 2021 at 7:47 PM Otto Fowler  wrote:
>>
>> What is the complete stack trace of that exception?
>>
>> On Apr 14, 2021, at 02:36, Vibhath Ileperuma  
>> wrote:
>>
>> Requested array size exceeds VM limit
>>
>>


Re: NiFi 1.11.4 Custom Processor Development

2021-04-15 Thread Joe Witt
Nathan

Yes and I believe the changed dep version which doesnt use bintray is on
main now and so will be in 1.14 when we release it.

Thanks

On Thu, Apr 15, 2021 at 6:39 AM  wrote:

> Hi Joe, Russ,
>
> Many thanks for that information.
>
> Kind Regards,
>
> Nathan
>
> -Original Message-
> From: Joe Witt [mailto:joe.w...@gmail.com]
> Sent: 13 April 2021 18:16
> To: users@nifi.apache.org
> Subject: Re: NiFi 1.11.4 Custom Processor Development
>
> Nathan
>
> Nothing has occurred that should result in issues with using custom
> processors.  The challenge comes in for those trying to build on older
> lines as bintray ages out and as tools like Maven (rightfully) push to
> https only based repositories.
>
> Thanks
>
> On Tue, Apr 13, 2021 at 8:04 AM Russell Bateman 
> wrote:
> >
> > There shouldn't be any problem. I have many custom processors in a NAR I
> haven't rebuilt since 1.12 and we use them successfully. They're just
> missing versions because we had not yet upgraded the Maven we used to build
> the NAR (not exactly relevant, I know, but I thought I'd I would point this
> out).
> >
> >
> > On 4/13/21 8:55 AM, nathan.engl...@bt.com wrote:
> >
> > Hi Joe,
> >
> > Thanks for that information.
> >
> > Will there be any issue in running a Custom NAR file built against 1.9.2
> on a different versioned cluster. We haven't experienced anything yet, but
> would hate to spend ages in debugging an issue that is the result of this!
> >
> > Kind Regards,
> >
> > Nathan
> > 
> > From: Joe Witt 
> > Sent: Tuesday, April 13, 2021 5:48 pm
> > To: users@nifi.apache.org
> > Subject: Re: NiFi 1.11.4 Custom Processor Development
> >
> > Hello
> >
> > In moving to support maven 3.8 we found only https based repos are
> allowed by default.  In addressing this we updated the build to move away
> from bintray and another location as well.
> >
> > This is now the case when you build master.  You could try to cherry
> pick and edit the key commits or hang tight for us to kick out a 1.14
> release.  No time table for that yet though.
> >
> > Thanks
> >
> > On Tue, Apr 13, 2021 at 7:45 AM  wrote:
> >>
> >> Hi There,
> >>
> >> I am in the process of upgrading the version of the nifi-nar-bundle
> from 1.9.2 to 1.12.1 however I've hit a unexpected issue.
> >>
> >> We develop on an offline platform, with Maven Central Mirrored using a
> Nexus Repository Manager, with all dependencies delivered through this
> mechanism. Today when I have changed the version from 1.9.2 to 1.12.1 I
> have found that the parent (I assume for the groovy processor) has a
> dependency on groovy-eclipse-batch version 2.5.4-01 which seems to be only
> available from a Groovy Repository (
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdl.bintray.com%2Fgroovy%2Fmaven%2F&data=04%7C01%7Cnathan.english%40bt.com%7Cfb2a121f2191473ce18b08d8fe8f206a%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C637539237968715956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=F7nJQDes4TMtFj1f0ygKe7KW%2ByaUtFVUrV1n%2F0iGPyY%3D&reserved=0)
> and not maven central.
> >>
> >> I also noticed that bintray will be closing shortly  (
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjfrog.com%2Fblog%2Finto-the-sunset-bintray-jcenter-gocenter-and-chartcenter%2F&data=04%7C01%7Cnathan.english%40bt.com%7Cfb2a121f2191473ce18b08d8fe8f206a%7Ca7f356889c004d5eba4129f146377ab0%7C0%7C0%7C637539237968715956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Bu%2FVrW7DiXV98%2FZmcWu770YyyRFr3B6jE4mhH7pIuLE%3D&reserved=0),
> so I guess the dependency will need correcting before then?
> >>
> >> Kind Regards,
> >>
> >> Nathan
> >>
> >> Nathan English
> >> Applications Specialist - Cyber Delivery & DevOps
> >
> >
>


Re: Insufficient Permission to Clear Queue

2021-04-28 Thread Joe Witt
Shawn

Yeah probably not our best response to help you figure out and resolve the
issue on your own.  Please file a JIRA with what the settings where, what
you tried and received, and how you fixed and why it was confusing.  Just
so whoever picks that up will have the context and can make it more like
what would have helped you understand and resolve the issue sooner.

Thanks

On Wed, Apr 28, 2021 at 7:46 AM Shawn Weeks 
wrote:

> And that was it, but shouldn’t the error message have listed the node as
> not having permission instead of saying my user didn’t have permission?
>
>
>
> This right here is totally a lie  because my user did have the permission
> it was the node itself that didn’t and that’s not mentioned anywhere in the
> error.
>
>
>
> AccessDeniedExceptionMapper identity[sweeks], groups[admin] does not have
> permission to access the requested resource.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From:* Chris Sampson 
> *Sent:* Wednesday, April 28, 2021 9:08 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Insufficient Permission to Clear Queue
>
>
>
> Do all of the NiFi instances themselves have the same permissions?
>
>
>
> Both the operating user (i.e. you) and the Node Identities within the
> cluster need to have permissions for viewing and modifying data (had this
> issue myself earlier today until I remembered what was needed).
>
>
>
>
> ---
>
> *Chris Sampson*
>
> IT Consultant
>
> chris.samp...@naimuri.com
>
> 
>
>
>
>
>
> On Wed, 28 Apr 2021 at 14:06, Shawn Weeks 
> wrote:
>
> On a new NiFi 1.13.2 installation I’m receiving an Insufficient Permission
> error when I try and clear a queue on the root canvas. My user is a member
> of a local group called “admin” that has been assigned every permission on
> the root canvas include “view the data”, “modify the data”, etc, as well as
> everything under the system policies. I can’t figure out what permission
> I’m missing. I also can’t view the queue either.
>
>
>
> 2021-04-28 13:01:44,874 INFO [NiFi Web Server-324]
> o.a.n.w.s.NiFiAuthenticationFilter Authentication success for sweeks
>
> 2021-04-28 13:01:44,875 INFO [NiFi Web Server-324]
> o.a.n.w.a.c.AccessDeniedExceptionMapper identity[sweeks], groups[admin]
> does not have permission to access the requested resource. Unable to modify
> the data for Processor with ID 1887dab4-0179-1000--99018b60.
> Returning Forbidden response.
>
> 2021-04-28 13:01:44,876 INFO [NiFi Web Server-312]
> o.a.n.w.s.NiFiAuthenticationFilter Attempting request for (<
> node1.example.org>) POST
> https://node1.example.org:8443/nifi-api/flowfile-queues/18882d1e-0179-1000--04f46c76/drop-requests
> (source ip: 10.208.126.182)
>
> 2021-04-28 13:01:44,876 INFO [NiFi Web Server-312]
> o.a.n.w.s.NiFiAuthenticationFilter Authentication success for sweeks
>
>
>
> Thanks
>
> Shawn
>
>


Re: Broken pipe write failed errors

2021-05-29 Thread Joe Witt
What JVM are you using?

Thanks

On Sat, May 29, 2021 at 11:16 AM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> Not related to Nifi, but I faced the same type of issue for endpoints
> behind a proxy which takes more than 30 seconds to answer. Fixed by
> replacing Apache Http client by OkHttp. I did not investigate further, just
> simply replaced one library by another and the error was fixed.
>
>
> Juan
>
> On Sat, 29 May 2021 at 15:08, Robert R. Bruno  wrote:
>
>> I wanted to see if anyone has any ideas on this one.  Since upgrading to
>> 1.13.2 from 1.9.2 we are starting to see broken pipe (write failed) errors
>> from a few invokeHttp processers.
>>
>> It is happening to processors talking to different endpoints, so I am
>> suspecting it is on the nifi side.  We are now using load balanced queues
>> throughout our flow.  Is it possible we are hitting a http connection
>> resource issue or something like that? A total guess I'll admit.
>>
>> If this could be it, does anyone know which parameter(s) to play with in
>> the properties file?  I know there is one setting for jetty threads and
>> another for max concurrent requests, but it isn't quite clear to me of they
>> are at all involved with invokeHttp calls.
>>
>> Thanks in advance!
>>
>> Robert
>>
>


Re: Broken pipe write failed errors

2021-05-29 Thread Joe Witt
K. We have seen specific jvm versions causing issues with socket handling.
But had not seen it on Java 11 though may be possible.   Is there a full
stack trace?

On Sat, May 29, 2021 at 12:00 PM Robert R. Bruno  wrote:

> We upgraded to java 11 when we upgrade to 1.13.2 we were on java 8 with
> 1.9.2.
>
> On Sat, May 29, 2021, 14:21 Joe Witt  wrote:
>
>> What JVM are you using?
>>
>> Thanks
>>
>> On Sat, May 29, 2021 at 11:16 AM Juan Pablo Gardella <
>> gardellajuanpa...@gmail.com> wrote:
>>
>>> Not related to Nifi, but I faced the same type of issue for endpoints
>>> behind a proxy which takes more than 30 seconds to answer. Fixed by
>>> replacing Apache Http client by OkHttp. I did not investigate further, just
>>> simply replaced one library by another and the error was fixed.
>>>
>>>
>>> Juan
>>>
>>> On Sat, 29 May 2021 at 15:08, Robert R. Bruno  wrote:
>>>
>>>> I wanted to see if anyone has any ideas on this one.  Since upgrading
>>>> to 1.13.2 from 1.9.2 we are starting to see broken pipe (write failed)
>>>> errors from a few invokeHttp processers.
>>>>
>>>> It is happening to processors talking to different endpoints, so I am
>>>> suspecting it is on the nifi side.  We are now using load balanced queues
>>>> throughout our flow.  Is it possible we are hitting a http connection
>>>> resource issue or something like that? A total guess I'll admit.
>>>>
>>>> If this could be it, does anyone know which parameter(s) to play with
>>>> in the properties file?  I know there is one setting for jetty threads and
>>>> another for max concurrent requests, but it isn't quite clear to me of they
>>>> are at all involved with invokeHttp calls.
>>>>
>>>> Thanks in advance!
>>>>
>>>> Robert
>>>>
>>>


Re: Regarding jira issue: NIFI-7856

2021-06-18 Thread Joe Witt
Sanjeet

If I understand correctly you are trying to backport a fix we have
completed in the 1.13.x line to your 1.12.x line and asking for assistance
with that.

I recommend you upgrade to 1.13.x instead of trying to backport.  However,
yes on quick read your stated approach does seem fine.

Thanks

On Fri, Jun 18, 2021 at 5:32 AM sanjeet rath  wrote:

> Hi,
>
> Any help input on my trailed mail is really appriciated .
> I am almost stuck here from past 2 weeks.
> Tried almost all the option i have.
>
> Thank u everyone in advance.
> Regards,
> Sanjeet
>
> On Tue, 15 Jun 2021, 7:56 pm sanjeet rath,  wrote:
>
>> Hi,
>>
>> The symptoms mentioned in the jira issue( 
>> *https://issues.apache.org/jira/browse/NIFI-7856
>>  *), i am
>> observing this in one of our PROD clusters.
>>
>> ERROR [Compress Provenance Logs-1-thread-2] o.a.n.p.s.EventFileCompressor 
>> Failed to compress ./provenance_repository/1693519.prov on rollover
>> java.io.FileNotFoundException: ./provenance_repository/1693519.prov (No such 
>> file or directory)
>>
>>
>> i saw the code is fixed in 1.13 version with bellow file changes in
>> nifi-provenance-repository-bundle .
>>
>>
>> nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/store/RecordWriterLease.java
>> 
>> diff
>> 
>>  | blob
>> 
>>  | history
>> 
>>
>> nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/store/WriteAheadStorePartition.java
>> 
>> So i have modified above 2 file changes on top of my 1.12.1 version of my
>> nifi-provenance-repository-bundle and build the nifi-provenance-repository.
>> nar file. then i will deploy this new Nar file to to /Lib folder of 1.12.1
>> version of niif
>>
>> Is my above approach is correct ?
>>
>> Second thing is, I am facing one issue in 1.12 version, unable to
>> replicate the provenance error in the lower environment (Tried with the
>> 7856.xml template attached in the jira by Mark)
>> So not able to understand whether the above change i made is worked or
>> not.
>> As i can not directly deploy the Nar to prod env where the error is
>> constantly coming in every hour.
>>
>> Could you please help me to replicate this issue in 1.12.1 version
>> Along with the template any other config changes i need to do replicate
>> the issue.
>>
>>
>> Regards,
>> --
>> Sanjeet Kumar Rath,
>>
>>
>>


Re: Taking Huge Time for Connect all Nodes to NIFI cluster

2021-06-29 Thread Joe Witt
Hello

A cluster that size should be fine. We did make various improvement to
cluster behavior and startup times though.  What prevents you from moving
to 1.13?

How many flowfiles are in the repo when restarting is taking that long?

thanks

On Tue, Jun 29, 2021 at 7:38 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Impressive cluster size!  I do not have an answer for you, but could you
> change your architecture so that instead of one large NiFi cluster you have
> 2 or 3 smaller clusters?  Very curious on the answer here as I have also
> noticed UI slow-downs as the number of nodes increases.
>
> -Joe
> On 6/29/2021 3:45 AM, Modepalli Venkata Avinash wrote:
>
> Hi List,
>
>
>
> We have 13 Nodes NIFI cluster in production & it’s taking huge time for
> completing NIFI restart.
>
> According to our analysis, flow election & flow validation from other
> nodes with coordinator is taking more time, approx. ~30 hours.
>
> Even after all 13 nodes gets connected, NIFI UI responds too slowly.
> Please find below cluster details.
>
>
>
> Apache NIFI Version : 1.9.0
>
> Flow.xml.gz size : 13MB (gz compressed)
>
> OS : RHEL 7.6
>
> JDK : jdk1.8.0_151
>
> GC : Default GC(Parallel GC) of JDK1.8 is in place. Commented out G1GC
> because of Numerous bugs in JDK8 while using with WriteaHeadProvenance
> Repository
>
> Min & Max Memory : 140GB
>
> Server Memory Per Node : 256GB
>
> CPU/CORE : 48
>
> Number of Nodes in Cluster : 13
>
> Max Timer Driven Thread : 100
>
> Running Processors Count : 12K
>
> Stopped Processors Count : 10K
>
> Disabled Processors Count : 25K
>
> Total Processor Count : 47K
>
>
>
> We couldn’t find any abnormalities in app logs, bootstrap logs & GC
> logging. Could you please share any input to identify & resolve this issue.
>
> Thanks for your help.
>
>
>
> *Thanks & Regards,*
>
> *Avinash M V*
>
>
>
>
>
> 
>  Virus-free.
> www.avg.com
> 
> <#m_-1430803108974534026_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
>


Re: Taking Huge Time for Connect all Nodes to NIFI cluster

2021-06-29 Thread Joe Witt
I dont recommend splitting the clusters for this.  Lets figure out what is
happening for your case.

On Tue, Jun 29, 2021 at 10:25 PM Modepalli Venkata Avinash <
modepalli.avin...@subex.com> wrote:

> Hi Joe,
>
>
>
> Thanks for your feedback.
>
> Yes, working on to split into multiple clusters. But it looks like this
> problem will be still there while restarting NIFI cluster if flow.xml.gz
> size is grows in 10’s of MBs (When more number of components/processors
> deployed on root canvas increases).
>
>
>
> *Thanks & Regards,*
>
> *Avinash M V*
>
>
>
> *From:* Joe Obernberger 
> *Sent:* 29 June 2021 17:09
> *To:* users@nifi.apache.org; Modepalli Venkata Avinash <
> modepalli.avin...@subex.com>
> *Subject:* Re: Taking Huge Time for Connect all Nodes to NIFI cluster
>
>
>
> *CAUTION:* This e-mail originated from outside of the organization. Do
> not click links or open attachments unless you recognise the sender and
> know the content is safe.
>
>
>
> Impressive cluster size!  I do not have an answer for you, but could you
> change your architecture so that instead of one large NiFi cluster you have
> 2 or 3 smaller clusters?  Very curious on the answer here as I have also
> noticed UI slow-downs as the number of nodes increases.
>
> -Joe
>
> On 6/29/2021 3:45 AM, Modepalli Venkata Avinash wrote:
>
> Hi List,
>
>
>
> We have 13 Nodes NIFI cluster in production & it’s taking huge time for
> completing NIFI restart.
>
> According to our analysis, flow election & flow validation from other
> nodes with coordinator is taking more time, approx. ~30 hours.
>
> Even after all 13 nodes gets connected, NIFI UI responds too slowly.
> Please find below cluster details.
>
>
>
> Apache NIFI Version : 1.9.0
>
> Flow.xml.gz size : 13MB (gz compressed)
>
> OS : RHEL 7.6
>
> JDK : jdk1.8.0_151
>
> GC : Default GC(Parallel GC) of JDK1.8 is in place. Commented out G1GC
> because of Numerous bugs in JDK8 while using with WriteaHeadProvenance
> Repository
>
> Min & Max Memory : 140GB
>
> Server Memory Per Node : 256GB
>
> CPU/CORE : 48
>
> Number of Nodes in Cluster : 13
>
> Max Timer Driven Thread : 100
>
> Running Processors Count : 12K
>
> Stopped Processors Count : 10K
>
> Disabled Processors Count : 25K
>
> Total Processor Count : 47K
>
>
>
> We couldn’t find any abnormalities in app logs, bootstrap logs & GC
> logging. Could you please share any input to identify & resolve this issue.
>
> Thanks for your help.
>
>
>
> *Thanks & Regards,*
>
> *Avinash M V*
>
>
>
>
>
>
> 
>
> Virus-free. www.avg.com
> 
>
>
>
>


Re: PYFI Python Nifi Clone

2021-07-10 Thread Joe Witt
Sounds fun and looks cool.

But do not violate the marks such as do not use the Apache NiFi logo.

Thanks

On Sat, Jul 10, 2021 at 9:38 AM Darren Govoni  wrote:

> Hi!,
>Just sharing a fun project I'll post on github soon. I'm creating a
> pure python clone of Nifi that separates the UI (Vue/NodeJS implementation)
> from the backend distributed messaging layer (RabbitMQ, Redis, AMQP, SQS).
> It will allow for runtime scripting of processors using python and leverage
> a variety of transactional message brokers and distributed topologies (e.g.
> AMQP).
>
> Here is a sneak peek at my port of the UI to Vue/NodeJS which I'll share
> on github soon (minified). It's a fully MVC/Node/Vue reactive and
> responsive UI that adheres to Material Design 2.0 standard. Also uses
> webpack build and is minified, etc.
>
> Makes a number of improvements such as tabs for multiple flow renders and
> will interface directly with git for flow versioning.
>
> Cheers!
>
> Darren
>


[ANNOUNCE] Apache NiFi 1.14.0 release

2021-07-16 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.14.0.

This is a significant feature, improvement, and stability focused release.

With this release users will now find that a default out of the box
NiFi starts with
important security controls enabled by default!

We have now merged the Apache NiFi, Apache NiFi Registry, the Apache
NiFi MiNiFi Java
and a stateless NiFi capability all in the same source release line
just available
via specific convenience binaries. Be sure to look at the release highlights
linked below for more details.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

Be sure, as always but in particular with this release, to review the
migration guidance
as provided https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12349644

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.14.0

Thank you
The Apache NiFi team


Re: NiFi Queue Monitoring

2021-07-21 Thread Joe Witt
Scott

Nifi supports both push and pull. Push via reporting tasks and pull via
rest api.

Are you needing a particular impl of a reporting task?

You are right this is a common need.  Solved using one of these methods.

Thanks

On Wed, Jul 21, 2021 at 2:58 PM scott  wrote:

> Great comments all. I agree with the architecture comment about push
> monitoring. I've been monitoring applications for more than 2 decades now,
> but sometimes you have to work around the limitations of the situation. It
> would be really nice if NiFi had this logic built-in, and frankly I'm
> surprised it is not yet. I can't be the only one who has had to deal with
> queues filling up, causing problems downstream. NiFi certainly knows that
> the queues fill up, they change color and execute back-pressure logic. If
> it would just do something simple like write a log/error message to a log
> file when this happens, I would be good.
> I have looked at the new metrics and reporting tasks but still haven't
> found the right thing to do to get notified when any queue in my
> instance fills up. Are there any examples of using them for a similar task
> you can share?
>
> Thanks,
> Scott
>
> On Wed, Jul 21, 2021 at 11:29 AM u...@moosheimer.com 
> wrote:
>
>> In general, it is a bad architecture to do monitoring via pull request.
>> You should always push. I recommend a look at the book "The Art of
>> Monitoring" by James Turnbull.
>>
>> I also recommend the very good articles by Pierre Villard on the subject
>> of NiFi monitoring at
>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/.
>>
>> Hope this helps.
>>
>> Mit freundlichen Grüßen / best regards
>> Kay-Uwe Moosheimer
>>
>> Am 21.07.2021 um 16:45 schrieb Andrew Grande :
>>
>> 
>> Can't you leverage some of the recent nifi features and basically run sql
>> queries over NiFi metrics directly as part of the flow? Then act on it with
>> a full flexibility of the flow. Kinda like a push design.
>>
>> Andrew
>>
>> On Tue, Jul 20, 2021, 2:31 PM scott  wrote:
>>
>>> Hi all,
>>> I'm trying to setup some monitoring of all queues in my NiFi instance,
>>> to catch before queues become full. One solution I am looking at is to use
>>> the API, but because I have a secure NiFi that uses LDAP, it seems to
>>> require a token that expires in 24 hours or so. I need this to be an
>>> automated solution, so that is not going to work. Has anyone else tackled
>>> this problem with a secure LDAP enabled cluster?
>>>
>>> Thanks,
>>> Scott
>>>
>>


Re: [NiFi-8760] Processors fail to process flowfiles with VolatileContentRepository

2021-07-23 Thread Joe Witt
It seems like any use case that we previously thought VolatileContentRepo
would be good for now we'd say Stateless NiFi is a dramatically better
approach.

We need to doc this better but the capability is there now for sure.

On Fri, Jul 23, 2021 at 8:13 AM Mark Payne  wrote:

> Matthieu,
>
> I would highly recommend against using VolatileContentRepository. You’re
> the first one I’ve heard of using it in a few years. Typically, the
> FileSystemRepository is sufficient. If you truly want to run with the
> content in RAM I would recommend creating a RAM Disk and pointing the
> FileSystemRepository to that.
>
> Thanks
> -Mark
>
>
> On Jul 21, 2021, at 10:31 AM, Matthieu Ré  wrote:
>
> Hi Chris, thank you for your quick response
>
> I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the first
> RC and it still had the problem, so I am not sure if this is related to the
> session handling you pointed out, that has been fixed in 1.13.2
>
> Le mer. 21 juil. 2021 à 16:22, Chris Sampson 
> a écrit :
>
>> 1.13.1 was known to have problems with session handling - see the Release
>> Note "lowlights" for 1.13.1 [1]
>>
>> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). If
>> you can't upgrade then 1.13.0 would be better than 1.13.1.
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1
>>
>> ---
>> *Chris Sampson*
>> IT Consultant
>> chris.samp...@naimuri.com
>> 
>>
>>
>> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré  wrote:
>>
>>> Hi all,
>>>
>>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch
>>> to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we
>>> use (and probably others that we didn't try) were not able to process
>>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the 
>>> Jira
>>> ticket NiFi-8760 ).
>>>
>>> I wanted to know if any of you guys are able to reproduce the issue, and
>>> if this is not a misconfiguration from our side. The nifi.properties and
>>> flow.xml.gz used are available in the ticket. If I am not missing anything,
>>> we could identify that the issue could come from this commit
>>> 
>>>  since
>>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0.
>>>
>>> Open to contribute as much as I can if you confirm that this is not due
>>> to a misconfiguration..
>>>
>>> Thanks !
>>> Matthieu
>>>
>>
>
> --
>
> Matthieu RÉ
> Data Scientist - Machine Learning Engineer - Dassault Systèmes
>
> ENSIIE, M2 AIC (Université Paris-Saclay)
>
> Tel: 0631609755
>
> Email: re.matth...@gmail.com
>
>
>


Re: NiFi Queue Monitoring

2021-07-27 Thread Joe Witt
Scott

This sounds pretty darn cool.  Any chance you'd be interested in
kicking out a blog on it?

Thanks

On Tue, Jul 27, 2021 at 9:58 AM scott  wrote:
>
> Matt/all,
> I was able to solve my problem using the QueryNiFiReportingTask with "SELECT 
> * FROM CONNECTION_STATUS WHERE isBackPressureEnabled = true" and the new 
> LoggingRecordSink as you suggested. Everything is working flawlessly now. 
> Thank you again!
>
> Scott
>
> On Wed, Jul 21, 2021 at 5:09 PM Matt Burgess  wrote:
>>
>> Scott,
>>
>> Glad to hear it! Please let me know if you have any questions or if
>> issues arise. One thing I forgot to mention is that I think
>> backpressure prediction is disabled by default due to the extra
>> consumption of CPU to do the regressions, make sure the
>> "nifi.analytics.predict.enabled" property in nifi.properties is set to
>> "true" before starting NiFi.
>>
>> Regards,
>> Matt
>>
>> On Wed, Jul 21, 2021 at 7:21 PM scott  wrote:
>> >
>> > Excellent! Very much appreciate the help and for setting me on the right 
>> > path. I'll give the queryNiFiReportingTask code a try.
>> >
>> > Scott
>> >
>> > On Wed, Jul 21, 2021 at 3:26 PM Matt Burgess  wrote:
>> >>
>> >> Scott et al,
>> >>
>> >> There are a number of options for monitoring flows, including
>> >> backpressure and even backpressure prediction:
>> >>
>> >> 1) The REST API for metrics. As you point out, it's subject to the
>> >> same authz/authn as any other NiFi operation and doesn't sound like it
>> >> will work out for you.
>> >> 2) The Prometheus scrape target via the REST API. The issue would be
>> >> the same as #1 I presume.
>> >> 3) PrometheusReportingTask. This is similar to the REST scrape target
>> >> but isn't subject to the usual NiFi authz/authn stuff, however it does
>> >> support SSL/TLS for a secure solution (and is also a "pull" approach
>> >> despite it being a reporting task)
>> >> 4) QueryNiFiReportingTask. This is not included with the NiFi
>> >> distribution but can be downloaded separately, the latest version
>> >> (1.14.0) is at [1]. I believe this is what Andrew was referring to
>> >> when he mentioned being able to run SQL queries over the information,
>> >> you can do something like "SELECT * FROM CONNECTION_STATUS_PREDICTIONS
>> >> WHERE predictedTimeToBytesBackpressureMillis < 1". This can be
>> >> done either as a push or pull depending on the Record Sink you choose.
>> >> A SiteToSiteReportingRecordSink, KafkaRecordSink, or LoggingRecordSink
>> >> results in a push (to NiFi, Kafka, or nifi-app.log respectively),
>> >> where a PrometheusRecordSink results in a pull the same as #2 and #3.
>> >> There's even a ScriptedRecordSink where you can write your own script
>> >> to put the results where you want them.
>> >> 5) The other reporting tasks. These have been mentioned frequently in
>> >> this thread so no need for elaboration here :)
>> >>
>> >> Regards,
>> >> Matt
>> >>
>> >> [1] 
>> >> https://repository.apache.org/content/repositories/releases/org/apache/nifi/nifi-sql-reporting-nar/1.14.0/
>> >>
>> >> On Wed, Jul 21, 2021 at 5:58 PM scott  wrote:
>> >> >
>> >> > Great comments all. I agree with the architecture comment about push 
>> >> > monitoring. I've been monitoring applications for more than 2 decades 
>> >> > now, but sometimes you have to work around the limitations of the 
>> >> > situation. It would be really nice if NiFi had this logic built-in, and 
>> >> > frankly I'm surprised it is not yet. I can't be the only one who has 
>> >> > had to deal with queues filling up, causing problems downstream. NiFi 
>> >> > certainly knows that the queues fill up, they change color and execute 
>> >> > back-pressure logic. If it would just do something simple like write a 
>> >> > log/error message to a log file when this happens, I would be good.
>> >> > I have looked at the new metrics and reporting tasks but still haven't 
>> >> > found the right thing to do to get notified when any queue in my 
>> >> > instance fills up. Are there any examples of using them for a similar 
>> >> > task you can share?
>> >> >
>> >> > Thanks,
>> >> > Scott
>> >> >
>> >> > On Wed, Jul 21, 2021 at 11:29 AM u...@moosheimer.com 
>> >> >  wrote:
>> >> >>
>> >> >> In general, it is a bad architecture to do monitoring via pull 
>> >> >> request. You should always push. I recommend a look at the book "The 
>> >> >> Art of Monitoring" by James Turnbull.
>> >> >>
>> >> >> I also recommend the very good articles by Pierre Villard on the 
>> >> >> subject of NiFi monitoring at 
>> >> >> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/.
>> >> >>
>> >> >> Hope this helps.
>> >> >>
>> >> >> Mit freundlichen Grüßen / best regards
>> >> >> Kay-Uwe Moosheimer
>> >> >>
>> >> >> Am 21.07.2021 um 16:45 schrieb Andrew Grande :
>> >> >>
>> >> >> 
>> >> >> Can't you leverage some of the recent nifi features and basically run 
>> >> >> sql queries over NiFi metrics directly as part of the flow? Then act 
>> >> >> on it with a full flexibili

Re: odd performance behavior 1.14

2021-07-31 Thread Joe Witt
Scott

Nope this sounds pretty dang unique

What JVM?   May need to attach a profiler.

I have seen buried exceptions happening at massive rates causing horrid
performance among a few other scenarios but nothing specific to 1.14

thanks

On Sat, Jul 31, 2021 at 4:01 PM scott  wrote:

> Hi All,
> I upgraded to 1.14 last week and within a few days I started to see some
> pretty odd behavior, I'm hoping someone has either seen it before or could
> point me to a deeper level of troubleshooting.
>
> Here are the symptoms I observed.
> * Performance issues:
> Processor performance very poor. Even simple processors like router
> and updateattribute went from being able to process 100,000recs/min to
> 100recs/min or stop all together, but not consistently.
> Processors needing to be force killed, even simple ones like
> updateattribute.
>
> * Weirdness. One of my routers lost its mind and didn't recognize the
> routes configured anymore. It changed all the arrows to dotted lines except
> for the default. I ended up copying it and replacing it with the copy, no
> changes mind you, but it worked fine.
>
> * Errors: I have not found any obvious errors in the nifi logs that could
> explain this, but one error keeps repeating in the logs: SQLServerDriver is
> not found . I have dozens of processors that use SQL Server, all seem to be
> working fine. This is not tied to a particular processor's configuration. I
> don't think this is related.
>
> * Server resources fine. I use htop and sar to troubleshoot hardware
> issues usually, all looks normal. I added 1/3 more memory to the JVM, now
> at 24GB, just for good measure, but that had no effect.
>
> Is it possible there are some hidden performance issues going on within
> the JVM I need a special tool to see?
>
> Any help would be greatly appreciated.
>
> Thanks,
> Scott
>
>
>
>


Re: Frozen Relationships - Remote Process Groups

2021-08-02 Thread Joe Witt
Ryan

Dont think that is a known issue/pattern we've heard much of.
Definitely will need a stack track set when you observe it.  Ideally
you get a stack dump when all looks good, then a stack trace once the
freeze had started, then again after some time within the freeze
again.  Share those along with the timestamps and observations please.

Thanks

On Mon, Aug 2, 2021 at 12:16 PM Ryan Hendrickson
 wrote:
>
> Hi all,
>Our team just upgraded to NiFi 1.13.2.  We've noticed frozen relationships 
> where data just queues up until the Remote Process Group is started/stopped 
> again.
>
>I don't have any specific stacktraces isolated yet, just curious if anyone 
> else has experienced any issues with communication across Remote Process 
> Groups in 1.13.2
>
> Thanks,
> Ryan


Re: Re: Disfunctional cluster with version 1.13.2

2021-08-17 Thread Joe Witt
If restarting is helping you then generally it should be easily found.  Is
mem usage spiking?  Are thread dumps revealing?

On Tue, Aug 17, 2021 at 6:12 AM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> We've rolled back to 1.11.4 in a couple scenarios.  We have also setup a
> cron job to restart 1.13.2 nodes once a day.
>
> Ryan
>
> On Tue, Aug 17, 2021 at 1:25 AM Axel Schwarz  wrote:
>
>> Hey Ryan,
>>
>> that sounds awefully familiar. What we successfully battled so far is the
>> load balancing problem.
>> You can find the whole plot of this drama in the mailing list archive,
>> title is "No Load Balancing since 1.13.2"
>>
>> Of course I will keep this thread updated, but unfortunately we had to
>> make the decision to role back completely to 1.12.1 because we just cannot
>> afford investing more time into this right now. But we'll certainly come
>> back to this later. We have to...
>>
>> --- Ursprüngliche Nachricht ---
>> Von: Ryan Hendrickson 
>> Datum: 17.08.2021 03:32:25
>> An: users@nifi.apache.org
>> Betreff: Re: Disfunctional cluster with version 1.13.2
>>
>> > Axel,
>> >We've had significant issues with 1.13.2 in a Cluster as well.  We're
>> >
>> > working on a test config... Issues range from abandoned FlowFiles,
>> single
>> >
>> > Nodes locking the entire cluster, load balance relationships not
>> working,
>> >
>> > and undocumented nifi properties.  We're reluctant to move to 1.14.0
>> > because we haven't seen anything specifically fixed in it.
>> >
>> >Please keep the community up-to-date on your findings.
>> >
>> > Ryan
>> >
>> > On Mon, Aug 16, 2021 at 11:00 AM Pierre Villard <
>> pierre.villard...@gmail.com>
>> >
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > What's the version of ZK?
>> > >
>> > > Thanks,
>> > > Pierre
>> > >
>> > > Le jeu. 12 août 2021 à 09:55, Axel Schwarz 
>> > a
>> > > écrit :
>> > >
>> > >> Dear all,
>> > >>
>> > >> after successfully battling the load balancing and installing Version
>> >
>> > >> 1.13.2 again in our 3 node production environment, we experienced
>> > another
>> > >> failure in the cluster resulting in a complete cut-off of the flow
>> > just
>> > >> 1,5h after the update.
>> > >> We noticed it just by trying to access the webinterface, which
>> > >> immediately after login showed something like:
>> > >>
>> > >> "Cannot replicate request to Node nifiHost1.contoso.com:8443
>> > because the
>> > >> node is not connected"
>> > >>
>> > >> There was nothing we could do through the webinterface aside from
>> > staring
>> > >> at this message and when looking at the live logs, there was nothing
>> >
>> > >> suspicious. The log moved on as if nothing happened.
>> > >> After a restart of the cluster everything was working fine again,
>> > but we
>> > >> saw, that the entire flow wasn't working for some period of time.
>> > This
>> > >> alone is really uncool, as we running a cluster for exactly that
>> > reason:
>> > >> The flow should keep working, even if some node decides to
>> malfunction
>> > for
>> > >> whatever reason.
>> > >>
>> > >> Digging a little deeper into the logs showed two noticable problems:
>> >
>> > >>
>> > >> 1. The Zookeeper is restarting every few minutes. Which in the log
>> > always
>> > >> looks like this:
>> > >>
>> > >> (nifiHost1.contoso.com)
>> > >>
>> nifi-app.log
>> >
>> > >>
>> > >> 2021-08-11 12:02:39,187 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> 2021-08-11 12:02:39,187 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer shutting down
>> > >> 2021-08-11 12:02:39,194 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.FinalRequestProcessor shutdown of request processor
>> > complete
>> > >> 2021-08-11 12:02:39,196 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.SyncRequestProcessor Shutting down
>> > >> 2021-08-11 12:02:39,196 INFO [SyncThread:1]
>> > >> o.a.z.server.SyncRequestProcessor SyncRequestProcessor exited!
>> > >> 2021-08-11 12:02:39,199 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to 4000
>> >
>> > >> 2021-08-11 12:02:39,200 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to 4
>> >
>> > >> 2021-08-11 12:02:39,200 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Created server with tickTime
>> > 2000
>> > >> minSessionTimeout 4000 maxSessionTimeout 4 datadir
>> > >> /opt/nifi/logs/zookeeper/version-2 snapdir
>> > >> /opt/nifi/state/zookeeper/version-2
>> > >> 2021-08-11 12

Re: Deadlock in loop

2021-08-18 Thread Joe Witt
Hello

This case should work very well.  Please share the details of the flow
configuration.  Can you download a flow template and share that?

thanks

On Wed, Aug 18, 2021 at 8:20 AM Aurélien Mazoyer 
wrote:

> Hi,
>
> I have a nifi flow that reads zip files. For each non-zip file it performs
> some treatment on its content and for each zip file it unzips it and
> performs the treatment on files in the archive. There is a loop in the flow
> so if a zip contains a zip, this zip will be reinjected at the beginning of
> the flow to be processed (and so one). However, when I have several zips in
> an archive, I experience a deadlock in my loop. Is there a solution to
> mitigate this issue in Nifi, such as having a back pressure on the first
> processor of the loop depending on the state on the queues in loop?
>
> Thank you,
>
> Aurelien
>


Re: Make NiFi Flow Read Only on Disconnect

2021-08-26 Thread Joe Witt
Shawn

So that is one direction (being more restrictive).  Another direction
is to simply ditch the logic we have and allow changes and simply
update the disconnected node when it rejoins.  We have all kinds of
super complex super duper awesome logic in there to help prevent users
from getting into a bad state.  What we have found is it was a lot of
effort with minimal payoff.  The simpler model is simply 'add the node
to the cluster, make changes to ensure the flow matches, and move on'.
The only failure case would be when connecting and there is data in a
connection which no longer exists.  We can make exception handling for
that mode.

How does that sound for you?

Thanks

On Thu, Aug 26, 2021 at 7:51 AM Shawn Weeks  wrote:
>
> Hi, I know there have been a lot of improvements handling flow.xml.gz 
> differences between nodes if a node get’s disconnected or is down. I was 
> wondering if there is a way to prevent NiFi from allowing any flow changes if 
> all nodes are not up and available, both on the node that’s in a 
> “DISCONNECTED” state and for the remaining cluster nodes. I’m trying to 
> prevent the scenarios where a node ends up a disconnected state but still 
> allows changes making reconnections more challenging.
>
>
>
> Thanks
>
> Shawn


Re: Make NiFi Flow Read Only on Disconnect

2021-08-26 Thread Joe Witt
Shawn

Ok cool.  I think we'll go that route.  A lot less code.  A lot easier
to reason over.  We tried to be clever and it kept backfiring.
Simpler wins.

Thanks

On Thu, Aug 26, 2021 at 8:28 AM Shawn Weeks  wrote:
>
> As long as we have a check to make sure no data is in flow on the node 
> joining that sounds wonderful and a lot simpler. In my most recent case I 
> just stopped the inputs and waited till everything cleared and then shutdown 
> and delete flow.xml.gz and restarted. That's already worlds easier than how 
> it used to be.
>
> Thanks
> Shawn
>
> -Original Message-
> From: Joe Witt 
> Sent: Thursday, August 26, 2021 10:23 AM
> To: users@nifi.apache.org
> Subject: Re: Make NiFi Flow Read Only on Disconnect
>
> Shawn
>
> So that is one direction (being more restrictive).  Another direction is to 
> simply ditch the logic we have and allow changes and simply update the 
> disconnected node when it rejoins.  We have all kinds of super complex super 
> duper awesome logic in there to help prevent users from getting into a bad 
> state.  What we have found is it was a lot of effort with minimal payoff.  
> The simpler model is simply 'add the node to the cluster, make changes to 
> ensure the flow matches, and move on'.
> The only failure case would be when connecting and there is data in a 
> connection which no longer exists.  We can make exception handling for that 
> mode.
>
> How does that sound for you?
>
> Thanks
>
> On Thu, Aug 26, 2021 at 7:51 AM Shawn Weeks  wrote:
> >
> > Hi, I know there have been a lot of improvements handling flow.xml.gz 
> > differences between nodes if a node get’s disconnected or is down. I was 
> > wondering if there is a way to prevent NiFi from allowing any flow changes 
> > if all nodes are not up and available, both on the node that’s in a 
> > “DISCONNECTED” state and for the remaining cluster nodes. I’m trying to 
> > prevent the scenarios where a node ends up a disconnected state but still 
> > allows changes making reconnections more challenging.
> >
> >
> >
> > Thanks
> >
> > Shawn


Re: Nifi & rtsp or rtmp

2021-08-30 Thread Joe Witt
Valentina\

Can you describe what you want to be able to do with the streaming video?

Thanks

On Mon, Aug 30, 2021 at 6:46 AM Valentina Ivanova
 wrote:
>
> Hello!
>
> I am tasked to handle streaming video in rtmp or rtsp and have a quite short 
> time to set it up. So I am wondering if Nifi is the best tool to use for this 
> task.
> I found this thread from 4 years ago: 
> http://mail-archives.apache.org/mod_mbox/nifi-users/201709.mbox/%3c3a956343-ec9f-4afa-975b-a9487bd04...@acesinc.net%3E
> and wonder if there are any development in that direction since then. I did 
> not find anything more recent.
>
> As far as I understand I can use ListenTCP and ListerUDP would be my best 
> option in Nifi. Is this correct? Any guidelines how to set such flow up would 
> be very appreciated. Would it be better/easier to use another tool?
>
> Thanks & all the best
> Valentina


Re: Round robin load balancing eventually stops using all nodes

2021-09-07 Thread Joe Witt
Ryan

If this is so easily replicated for you it should be trivially found and
fixed most likely.

Please share, for each node in your cluster, both a thread dump and heap
dump within 30 mins of startup and again after 24 hours.

This will allow us to see the delta and if there appears to be any sort of
leak.   If you cannot share these then you can do that analysis and share
the results.

Nobody should have to restart nodes to keep things healthy.

Joe

On Tue, Sep 7, 2021 at 12:58 PM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> We have a daily cron job that restarts our nifi cluster to keep it in a
> good state.
>
> On Mon, Sep 6, 2021 at 6:11 PM Mike Thomsen 
> wrote:
>
>> >  there is a ticket to overcome this (there is no ETA), but other
>> details might shed light to a different root cause.
>>
>> Good to know I'm not crazy, and it's in the TODO. Until then, it seems
>> fixable by bouncing the box.
>>
>> On Mon, Sep 6, 2021 at 7:14 AM Simon Bence 
>> wrote:
>> >
>> > Hi Mike,
>> >
>> > I did a quick check on the round robin balancing and based on what I
>> found the reason for the issue must lie somewhere else, not directly within
>> it. The one thing I can think of is the scenario where one (or more) nodes
>> are significantly slower than the other ones. In these cases it might
>> happen then the nodes are “running behind” blocks the other nodes from
>> balancing perspective.
>> >
>> > Based on what you wrote this is a possible reason and there is a ticket
>> to overcome this (there is no ETA), but other details might shed light to a
>> different root cause.
>> >
>> > Regards,
>> > Bence
>> >
>> >
>> >
>> > > On 2021. Sep 3., at 14:13, Mike Thomsen 
>> wrote:
>> > >
>> > > We have a 5 node cluster, and sometimes I've noticed that round robin
>> > > load balancing stops sending flowfiles to two of them, and sometimes
>> > > toward the end of the data processing can get as low as a single node.
>> > > Has anyone seen similar behavior?
>> > >
>> > > Thanks,
>> > >
>> > > Mike
>> >
>>
>


Re: Round robin load balancing eventually stops using all nodes

2021-09-27 Thread Joe Witt
Ryan,

Regarding NIFI-9236 the JIRA captures it well but sounds like there is
now a better understanding of how it works and what options exist to
better view details.

Regarding Load Balancing: NIFI-7081 is largely about the scenario
whereby in load balancing cases nodes which are slower effectively set
the rate the whole cluster can sustain because we don't have a fluid
load balancing strategy which we should.  Such a strategy would allow
for the fastest nodes to always take the most data.  We just need to
do that work.  No ETA.

Thanks

On Tue, Sep 21, 2021 at 2:18 PM Ryan Hendrickson
 wrote:
>
> Joe - We're testing some scenarios.  Andrew captured some confusing behavior 
> in the UI when enabling and disabling load balancing on a relationship: 
> "Update UI for Clustered Connections" -- 
> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-9236
>
> Question - When a FlowFile is Load Balanced from one node to another, is the 
> entire Content Claim load balanced?  Or just the small portion necessary?
>
> Mike -
> We found two tickets that are in the ballpark:
>
> 1.  Improve handling of Load Balanced Connections when one node is slow   --  
>   https://issues.apache.org/jira/browse/NIFI-7081
> 2.  NiFi FlowFiles stuck in queue when using Single Node load balance 
> strategy   --https://issues.apache.org/jira/browse/NIFI-8970
>
> From @Simon comment - we know we've seen underperforming nodes in a cluster 
> before.  We're discussing @Simon's comment is applicable to the issue we're 
> seeing
>   > "The one thing I can think of is the scenario where one (or more) 
> nodes are significantly slower than the other ones. In these cases it might 
> happen then the nodes are “running behind” blocks the other nodes from 
> balancing perspective."
>
> @Simon - I'd like to understand the "blocks other nodes from balancing 
> perspective" better if you have additional information.  We're trying to 
> replicate this scenario.
>
> Thanks,
> Ryan
>
> On Sat, Sep 18, 2021 at 3:45 PM Mike Thomsen  wrote:
>>
>> > there is a ticket to overcome this (there is no ETA),
>>
>> Do you know what the Jira # is?
>>
>> On Mon, Sep 6, 2021 at 7:14 AM Simon Bence  wrote:
>> >
>> > Hi Mike,
>> >
>> > I did a quick check on the round robin balancing and based on what I found 
>> > the reason for the issue must lie somewhere else, not directly within it. 
>> > The one thing I can think of is the scenario where one (or more) nodes are 
>> > significantly slower than the other ones. In these cases it might happen 
>> > then the nodes are “running behind” blocks the other nodes from balancing 
>> > perspective.
>> >
>> > Based on what you wrote this is a possible reason and there is a ticket to 
>> > overcome this (there is no ETA), but other details might shed light to a 
>> > different root cause.
>> >
>> > Regards,
>> > Bence
>> >
>> >
>> >
>> > > On 2021. Sep 3., at 14:13, Mike Thomsen  wrote:
>> > >
>> > > We have a 5 node cluster, and sometimes I've noticed that round robin
>> > > load balancing stops sending flowfiles to two of them, and sometimes
>> > > toward the end of the data processing can get as low as a single node.
>> > > Has anyone seen similar behavior?
>> > >
>> > > Thanks,
>> > >
>> > > Mike
>> >


Re: Trouble starting docker container

2021-10-18 Thread Joe Witt
Glad you found the problem and shared the resolution.  Thanks

On Mon, Oct 18, 2021 at 8:58 AM Jean-Sebastien Vachon <
jsvac...@brizodata.com> wrote:

> I fixed my problem... it was related to this
>
>
> https://stackoverflow.com/questions/69081508/nifi-migration-required-for-blank-sensitive-properties-key
>
> 
> java - Nifi Migration Required for blank Sensitive Properties Key - Stack
> Overflow
> 
> I'm using nifi 1.14.0 in container where I'm experiencing this problem
> when I restart nifi. Migration Required for blank Sensitive Properties Key
> erro 2021-09-07 01:15:03,672 INFO [main] o.a.n.p.
> stackoverflow.com
>
> Once I restored the value of nifi.sensitive.props.key, everything is fine
>
>
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Jean-Sebastien Vachon 
> *Sent:* Monday, October 18, 2021 10:54 AM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Trouble starting docker container
>
> I was able to recover the flow.xml.gz file from the container using this
> command:
>
> sudo docker cp nifi:/opt/nifi/nifi-current/conf/flow.xml.gz .
>
> But I can't seem to be able to use it in another container by mounting it
> as a volume.
>
> A new Nifi 1.14 based container will give me about the same error:
>
> 2021-10-18 14:43:15,660 ERROR [main]
> o.a.nifi.properties.NiFiPropertiesLoader Flow Configuration
> [./conf/flow.xml.gz] Found: Migration Required for blank Sensitive
> Properties Key [nifi.sensitive.props.key]
> 2021-10-18 14:43:15,662 ERROR [main] org.apache.nifi.NiFi Failure to
> launch NiFi due to java.lang.IllegalArgumentException: There was an issue
> decrypting protected properties
> java.lang.IllegalArgumentException: There was an issue decrypting
> protected properties
> at org.apache.nifi.NiFi.initializeProperties(NiFi.java:346)
> at
> org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:314)
> at
> org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:310)
> at org.apache.nifi.NiFi.main(NiFi.java:302)
> Caused by:
> org.apache.nifi.properties.SensitivePropertyProtectionException: Sensitive
> Properties Key [nifi.sensitive.props.key] not found: See Admin Guide
> section [Updating the Sensitive Properties Key]
> at
> org.apache.nifi.properties.NiFiPropertiesLoader.getDefaultProperties(NiFiPropertiesLoader.java:226)
> at
> org.apache.nifi.properties.NiFiPropertiesLoader.get(NiFiPropertiesLoader.java:209)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.nifi.NiFi.initializeProperties(NiFi.java:341)
> ... 3 common frames omitted
>
> I also tried using the flowfile with a previous version of Nifi... Nifi
> wants to start but then complain about this:
>
> 021-10-18 14:53:09,727 INFO [main] org.eclipse.jetty.server.Server Started
> @27643ms
> 2021-10-18 14:53:09,727 WARN [main] org.apache.nifi.web.server.JettyServer
> Failed to start web server... shutting down.
> ...
> org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller.
> Caused by: java.nio.file.FileSystemException: ./conf/flow.xml.gz: Device
> or resource busy
>
>
> Is there any way to move a flow.xml.gz from a machine to another?
>
>
>
> *Jean-Sébastien Vachon *
> Co-Founder & Architect
>
>
> *Brizo Data, Inc. www.brizodata.com
> 
> *
> --
> *From:* Jean-Sebastien Vachon 
> *Sent:* Monday, October 18, 2021 9:24 AM
> *To:* users@nifi.apache.org 
> *Subject:* Trouble starting docker container
>
> Hi all,
>
> I used an instance of Nifi 1.14 last friday (single user with password)
> and everything was fine until this morning.
> My PC was rebooted over the weekend and now I can't restart the container
> at all.
>
>
> Java home: /usr/local/openjdk-8
> NiFi home: /opt/nifi/nifi-current
>
> Bootstrap Config File: /opt/nifi/nifi-current/conf/bootstrap.conf
>
> 2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command
> Starting Apache NiFi...
> 2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command
> Working Directory: /opt/nifi/nifi-current
> 2021-10-18 13:17:22,856 INFO [main] org.apache.nifi.bootstrap.Command
> Command: /usr/local/openjdk-8/bin/java -classpath
> /opt/nifi/nifi-current/./conf:/opt/nifi/nifi-current/./lib/lo

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

"After fetching a FlowFile-stream file and unpacked it back into NiFi
I calculate a sha256. 1 minutes later I recalculate the sha256 on the
exact same file. And got a new hash. That is what worry’s me.
The fact that the same file can be recalculated and produce two
different hashes, is very strange, but it happens. "

Ok so to confirm you are saying that in each case this happens you see
it first compute the wrong hash, but then if you retry the same
flowfile it then provides the correct hash?

Can you please also show/share the lineage history for such a flow
file then?  It should have events for the initial hash, second hash,
the unpacking, trace to the original stream, etc...

Thanks

On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  wrote:
>
> Dear Mark and Joe
>
> I know my setup isn’t normal for many people. But if we only looks at my 
> receive side, which the last mails is about. Every thing is happening at the 
> same NIFI instance. It is the same 3 node NIFI cluster.
> After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two different 
> hashes, is very strange, but it happens. Over the last 5 months it have only 
> happen 35-40 times.
>
> I can understand if the file is not completely loaded and saved into the 
> content repository before the hashing starts. But I believe that the unpack 
> process don’t forward the flow file to the next process before it is 100% 
> finish unpacking and saving the new content to the repository.
>
> I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> files per node) and next process was a hashcontent before it run into a test 
> loop. Where files are uploaded via PutSFTP to a test server, and downloaded 
> again and recalculated the hash. I have had one issue after 3 days of running.
> Now the test flow is running without the Put/Fetch sftp processors.
>
> Another problem is that I can’t find any correlation to other events. Not 
> within NIFI, nor the server itself or VMWare. If I just could find any other 
> event which happens at the same time, I might be able to force some kind of 
> event to trigger the issue.
> I have tried to force VMware to migrate a NiFi node to another host. Forcing 
> it to do a snapshot and deleting snapshots, but nothing can trigger and error.
>
> I know it will be very very difficult to reproduce. But I will setup multiple 
> NiFi instances running different test flows to see if I can find any reason 
> why it behaves as it does.
>
> Kind Regards
> Jens M. Kofoed
>
> Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
>
> Jens,
>
> Thanks for sharing the images.
>
> I tried to setup a test to reproduce the issue. I’ve had it running for quite 
> some time. Running through millions of iterations.
>
> I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> iterations.
>
> So far I cannot replicate. And since you’re pulling the data via SFTP and 
> then unpacking, which preserves all original attributes from a different 
> system, this can easily become confusing.
>
> Recommend trying to reproduce with SFTP-related processors out of the 
> picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> an ‘initial hash’, copy that value to another attribute, and then loop, 
> generating the hash and comparing against the original one. I’ll attach a 
> flow that does this, but not sure if the email server will strip out the 
> attachment or not.
>
> This way we remove any possibility of actual corruption between the two nifi 
> instances. If we can still see corruption / different hashes within a single 
> nifi instance, then it certainly warrants further investigation but i can’t 
> see any issues so far.
>
> Thanks
> -Mark
>
>
>
>
>
> On Oct 20, 2021, at 10:21 AM, Joe Witt  wrote:
>
> Jens
>
> Actually is this current loop test contained within a single nifi and there 
> you see corruption happen?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:14 AM Joe Witt  wrote:
>
> Jens,
>
> You have a very involved setup including other systems (non NiFi).  Have you 
> removed those systems from the equation so you have more evidence to support 
> your expectation that NiFi is doing something other than you expect?
>
> Joe
>
> On Wed, Oct 20, 2021 at 7:10 AM Jens M. Kofoed  wrote:
>
> Hi
>
> Today I have another file which have been r

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

And to further narrow this down

"I have a test flow, where a GenerateFlowfile has created 6x 1GB files
(2 files per node) and next process was a hashcontent before it run
into a test loop. Where files are uploaded via PutSFTP to a test
server, and downloaded again and recalculated the hash. I have had one
issue after 3 days of running."

So to be clear with GenerateFlowFile making these files and then you
looping the content is wholly and fully exclusively within the control
of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
same files over and over in nifi itself you can make this happen or
cannot?

Thanks

On Wed, Oct 20, 2021 at 11:08 AM Joe Witt  wrote:
>
> Jens,
>
> "After fetching a FlowFile-stream file and unpacked it back into NiFi
> I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> exact same file. And got a new hash. That is what worry’s me.
> The fact that the same file can be recalculated and produce two
> different hashes, is very strange, but it happens. "
>
> Ok so to confirm you are saying that in each case this happens you see
> it first compute the wrong hash, but then if you retry the same
> flowfile it then provides the correct hash?
>
> Can you please also show/share the lineage history for such a flow
> file then?  It should have events for the initial hash, second hash,
> the unpacking, trace to the original stream, etc...
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  
> wrote:
> >
> > Dear Mark and Joe
> >
> > I know my setup isn’t normal for many people. But if we only looks at my 
> > receive side, which the last mails is about. Every thing is happening at 
> > the same NIFI instance. It is the same 3 node NIFI cluster.
> > After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> > calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> > same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two different 
> > hashes, is very strange, but it happens. Over the last 5 months it have 
> > only happen 35-40 times.
> >
> > I can understand if the file is not completely loaded and saved into the 
> > content repository before the hashing starts. But I believe that the unpack 
> > process don’t forward the flow file to the next process before it is 100% 
> > finish unpacking and saving the new content to the repository.
> >
> > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> > files per node) and next process was a hashcontent before it run into a 
> > test loop. Where files are uploaded via PutSFTP to a test server, and 
> > downloaded again and recalculated the hash. I have had one issue after 3 
> > days of running.
> > Now the test flow is running without the Put/Fetch sftp processors.
> >
> > Another problem is that I can’t find any correlation to other events. Not 
> > within NIFI, nor the server itself or VMWare. If I just could find any 
> > other event which happens at the same time, I might be able to force some 
> > kind of event to trigger the issue.
> > I have tried to force VMware to migrate a NiFi node to another host. 
> > Forcing it to do a snapshot and deleting snapshots, but nothing can trigger 
> > and error.
> >
> > I know it will be very very difficult to reproduce. But I will setup 
> > multiple NiFi instances running different test flows to see if I can find 
> > any reason why it behaves as it does.
> >
> > Kind Regards
> > Jens M. Kofoed
> >
> > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> >
> > Jens,
> >
> > Thanks for sharing the images.
> >
> > I tried to setup a test to reproduce the issue. I’ve had it running for 
> > quite some time. Running through millions of iterations.
> >
> > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune of 
> > hundreds of MB). I’ve been unable to reproduce an issue after millions of 
> > iterations.
> >
> > So far I cannot replicate. And since you’re pulling the data via SFTP and 
> > then unpacking, which preserves all original attributes from a different 
> > system, this can easily become confusing.
> >
> > Recommend trying to reproduce with SFTP-related processors out of the 
> > picture, as Joe is mentioning. Either using GetFile/FetchFile or 
> > GenerateFlowFile. Then immediately use CryptographicHashContent to generate 
> > an ‘initial hash’, copy that value to another attribute, and then loop, 
> > generating the hash and comparing against the origin

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-20 Thread Joe Witt
Jens,

Also what type of file system/storage system are you running NiFi on
in this case?  We'll need to know this for the NiFi
content/flowfile/provenance repositories? Is it NFS?

Thanks

On Wed, Oct 20, 2021 at 11:14 AM Joe Witt  wrote:
>
> Jens,
>
> And to further narrow this down
>
> "I have a test flow, where a GenerateFlowfile has created 6x 1GB files
> (2 files per node) and next process was a hashcontent before it run
> into a test loop. Where files are uploaded via PutSFTP to a test
> server, and downloaded again and recalculated the hash. I have had one
> issue after 3 days of running."
>
> So to be clear with GenerateFlowFile making these files and then you
> looping the content is wholly and fully exclusively within the control
> of NiFI.  No Get/Fetch/Put-SFTP of any kind at all. In by looping the
> same files over and over in nifi itself you can make this happen or
> cannot?
>
> Thanks
>
> On Wed, Oct 20, 2021 at 11:08 AM Joe Witt  wrote:
> >
> > Jens,
> >
> > "After fetching a FlowFile-stream file and unpacked it back into NiFi
> > I calculate a sha256. 1 minutes later I recalculate the sha256 on the
> > exact same file. And got a new hash. That is what worry’s me.
> > The fact that the same file can be recalculated and produce two
> > different hashes, is very strange, but it happens. "
> >
> > Ok so to confirm you are saying that in each case this happens you see
> > it first compute the wrong hash, but then if you retry the same
> > flowfile it then provides the correct hash?
> >
> > Can you please also show/share the lineage history for such a flow
> > file then?  It should have events for the initial hash, second hash,
> > the unpacking, trace to the original stream, etc...
> >
> > Thanks
> >
> > On Wed, Oct 20, 2021 at 11:00 AM Jens M. Kofoed  
> > wrote:
> > >
> > > Dear Mark and Joe
> > >
> > > I know my setup isn’t normal for many people. But if we only looks at my 
> > > receive side, which the last mails is about. Every thing is happening at 
> > > the same NIFI instance. It is the same 3 node NIFI cluster.
> > > After fetching a FlowFile-stream file and unpacked it back into NiFi I 
> > > calculate a sha256. 1 minutes later I recalculate the sha256 on the exact 
> > > same file. And got a new hash. That is what worry’s me.
> > > The fact that the same file can be recalculated and produce two different 
> > > hashes, is very strange, but it happens. Over the last 5 months it have 
> > > only happen 35-40 times.
> > >
> > > I can understand if the file is not completely loaded and saved into the 
> > > content repository before the hashing starts. But I believe that the 
> > > unpack process don’t forward the flow file to the next process before it 
> > > is 100% finish unpacking and saving the new content to the repository.
> > >
> > > I have a test flow, where a GenerateFlowfile has created 6x 1GB files (2 
> > > files per node) and next process was a hashcontent before it run into a 
> > > test loop. Where files are uploaded via PutSFTP to a test server, and 
> > > downloaded again and recalculated the hash. I have had one issue after 3 
> > > days of running.
> > > Now the test flow is running without the Put/Fetch sftp processors.
> > >
> > > Another problem is that I can’t find any correlation to other events. Not 
> > > within NIFI, nor the server itself or VMWare. If I just could find any 
> > > other event which happens at the same time, I might be able to force some 
> > > kind of event to trigger the issue.
> > > I have tried to force VMware to migrate a NiFi node to another host. 
> > > Forcing it to do a snapshot and deleting snapshots, but nothing can 
> > > trigger and error.
> > >
> > > I know it will be very very difficult to reproduce. But I will setup 
> > > multiple NiFi instances running different test flows to see if I can find 
> > > any reason why it behaves as it does.
> > >
> > > Kind Regards
> > > Jens M. Kofoed
> > >
> > > Den 20. okt. 2021 kl. 16.39 skrev Mark Payne :
> > >
> > > Jens,
> > >
> > > Thanks for sharing the images.
> > >
> > > I tried to setup a test to reproduce the issue. I’ve had it running for 
> > > quite some time. Running through millions of iterations.
> > >
> > > I’ve used 5 KB files, 50 KB files, 50 MB files, and larger (to the tune 
> > > of hundreds of MB). I’

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-27 Thread Joe Witt
ty()) {
> if (previousHistogram.equals(histogram)) {
> log.info("Histograms match")
> } else {
> logHistogramDifferences(previousHistogram, histogram)
> session.transfer(flowFile, REL_FAILURE)
> return;
> }
> }
>
> flowFile = session.putAllAttributes(flowFile, histogram)
> session.transfer(flowFile, REL_SUCCESS)
>
>
>
>
>
>
> On Oct 27, 2021, at 9:43 AM, Mark Payne  wrote:
>
> Jens,
>
> For a bit of background here, the reason that Joe and I have expressed 
> interest in NFS file systems is that the way the protocol works, it is 
> allowed to receive packets/chunks of the file out-of-order. So, what happens 
> is let’s say a 1 MB file is being written. The first 500 KB are received. 
> Then instead of the the 501st KB it receives the 503rd KB. What happens is 
> that the size of the file on the file system becomes 503 KB. But what about 
> 501 & 502? Well when you read the data, the file system just returns ASCII 
> NUL characters (byte 0) for those bytes. Once the NFS server receives those 
> bytes, it then goes back and fills in the proper bytes. So if you’re running 
> on NFS, it is possible for the contents of the file on the underlying file 
> system to change out from under you. It’s not clear to me what other types of 
> file system might do something similar.
>
> So, one thing that we can do is to find out whether or not the contents of 
> the underlying file have changed in some way, or if there’s something else 
> happening that could perhaps result in the hashes being wrong. I’ve put 
> together a script that should help diagnose this.
>
> Can you insert an ExecuteScript processor either just before or just after 
> your CryptographicHashContent processor? Doesn’t really matter whether it’s 
> run just before or just after. I’ll attach the script here. It’s a Groovy 
> Script so you should be able to use ExecuteScript with Script Engine = Groovy 
> and the following script as the Script Body. No other changes needed.
>
> The way the script works, it reads in the contents of the FlowFile, and then 
> it builds up a histogram of all byte values (0-255) that it sees in the 
> contents, and then adds that as attributes. So it adds attributes such as:
> histogram.0 = 280273
> histogram.1 = 2820
> histogram.2 = 48202
> histogram.3 = 3820
> …
> histogram.totalBytes = 1780928732
>
> It then checks if those attributes have already been added. If so, after 
> calculating that histogram, it checks against the previous values (in the 
> attributes). If they are the same, the FlowFile goes to ’success’. If they 
> are different, it logs an error indicating the before/after value for any 
> byte whose distribution was different, and it routes to failure.
>
> So, if for example, the first time through it sees 280,273 bytes with a value 
> of ‘0’, and the second times it only sees 12,001 then we know there were a 
> bunch of 0’s previously that were updated to be some other value. And it 
> includes the total number of bytes in case somehow we find that we’re reading 
> too many bytes or not enough bytes or something like that. This should help 
> narrow down what’s happening.
>
> Thanks
> -Mark
>
>
>
> On Oct 26, 2021, at 6:25 PM, Joe Witt  wrote:
>
> Jens
>
> Attached is the flow I was using (now running yours and this one).  Curious 
> if that one reproduces the issue for you as well.
>
> Thanks
>
> On Tue, Oct 26, 2021 at 3:09 PM Joe Witt  wrote:
>>
>> Jens
>>
>> I have your flow running and will keep it running for several days/week to 
>> see if I can reproduce.  Also of note please use your same test flow but use 
>> HashContent instead of crypto hash.  Curious if that matters for any 
>> reason...
>>
>> Still want to know more about your underlying storage system.
>>
>> You could also try updating nifi.properties and changing the following lines:
>> nifi.flowfile.repository.always.sync=true
>> nifi.content.repository.always.sync=true
>> nifi.provenance.repository.always.sync=true
>>
>> It will hurt performance but can be useful/necessary on certain storage 
>> subsystems.
>>
>> Thanks
>>
>> On Tue, Oct 26, 2021 at 12:05 PM Joe Witt  wrote:
>>>
>>> Ignore "For the scenario where you can replicate this please share the 
>>> flow.xml.gz for which it is reproducible."  I see the uploaded JSON
>>>
>>> On Tue, Oct 26, 2021 at 12:04 PM Joe Witt  wrote:
>>>>
>>>> Jens,
>>>>
>>>> We asked about the underlying storage system.  You replied with some info 
>>>> but not the specif

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-28 Thread Joe Witt
Jens,

Am 40+ hours in running both your flow and mine to reproduce.  So far
neither have shown any sign of trouble.  Will keep running for another
week or so if I can.

Thanks

On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed  wrote:
>
> The Physical hosts with VMWare is using the vmfs but the vm machines running 
> at hosts can’t see that.
> But you asked about the underlying file system 😀 and since my first answer 
> with the copy from the fstab file wasn’t enough I just wanted to give all the 
> details 😁.
>
> If you create a vm for windows you would probably use NTFS (on top of vmfs). 
> For Linux EXT3, EXT4, BTRFS, XFS and so on.
>
> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and 
> sdd) for each Linux machine. I don’t use nfs
>
> Kind regards
> Jens
>
>
>
> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt :
>
> Jens,
>
> I don't quite follow the EXT4 usage on top of VMFS but the point here
> is you'll ultimately need to truly understand your underlying storage
> system and what sorts of guarantees it is giving you.  If linux/the
> jvm/nifi think it has a typical EXT4 type block storage system to work
> with it can only be safe/operate within those constraints.  I have no
> idea about what VMFS brings to the table or the settings for it.
>
> The sync properties I shared previously might help force the issue of
> ensuring a formal sync/flush cycle all the way through the disk has
> occurred which we'd normally not do or need to do but again in some
> cases offers a stronger guarantee in exchange for performance.
>
> In any case...Mark's path for you here will help identify what we're
> dealing with and we can go from there.
>
> I am aware of significant usage of NiFi on VMWare configurations
> without issue at high rates for many years so whatever it is here is
> likely solvable.
>
> Thanks
>
> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed  wrote:
>
>
> Hi Mark
>
>
> Thanks for the clarification. I will implement the script when I return to 
> the office at Monday next week ( November 1st).
>
> I don’t use NFS, but ext4. But I will implement the script so we can check if 
> it’s the case here. But I think the issue might be after the processors 
> writing content to the repository.
>
> I have a test flow running for more than 2 weeks without any errors. But this 
> flow only calculate hash and comparing.
>
>
> Two other flows both create errors. One flow use 
> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use 
> MergeContent->UnpackContent->CryptographicHashContent->compares. The last 
> flow is totally inside nifi, excluding other network/server issues.
>
>
> In both cases the CryptographicHashContent is right after a process which 
> writes new content to the repository. But in one case a file in our 
> production flow did calculate a wrong hash 4 times with a 1 minutes delay 
> between each calculation. A few hours later I looped the file back and this 
> time it was OK.
>
> Just like the case in step 5 and 12 in the pdf file
>
>
> I will let you all know more later next week
>
>
> Kind regards
>
> Jens
>
>
>
>
> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne :
>
>
> And the actual script:
>
>
>
> import org.apache.nifi.flowfile.FlowFile
>
>
> import java.util.stream.Collectors
>
>
> Map getPreviousHistogram(final FlowFile flowFile) {
>
>final Map histogram = 
> flowFile.getAttributes().entrySet().stream()
>
>.filter({ entry -> entry.getKey().startsWith("histogram.") })
>
>.collect(Collectors.toMap({ entry -> entry.key}, { entry -> 
> entry.value }))
>
>return histogram;
>
> }
>
>
> Map createHistogram(final FlowFile flowFile, final 
> InputStream inStream) {
>
>final Map histogram = new HashMap<>();
>
>final int[] distribution = new int[256];
>
>Arrays.fill(distribution, 0);
>
>
>long total = 0L;
>
>final byte[] buffer = new byte[8192];
>
>int len;
>
>while ((len = inStream.read(buffer)) > 0) {
>
>for (int i=0; i < len; i++) {
>
>final int val = buffer[i];
>
>distribution[val]++;
>
>total++;
>
>}
>
>}
>
>
>for (int i=0; i < 256; i++) {
>
>histogram.put("histogram." + i, String.valueOf(distribution[i]));
>
>}
>
>histogram.put("histogram.totalBytes", String.valueOf(total));
>
>
>return histogram;
>
> }
>
>
> void logHistogramDifferences(final Map previo

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-29 Thread Joe Witt
Jens

Update from hour 67.  Still lookin' good.

Will advise.

Thanks

On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed  wrote:
>
> Many many thanks 🙏 Joe for looking into this. My test flow was running for 6 
> days before the first error occurred
>
> Thanks
>
> > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt :
> >
> > Jens,
> >
> > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > neither have shown any sign of trouble.  Will keep running for another
> > week or so if I can.
> >
> > Thanks
> >
> >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed  
> >> wrote:
> >>
> >> The Physical hosts with VMWare is using the vmfs but the vm machines 
> >> running at hosts can’t see that.
> >> But you asked about the underlying file system 😀 and since my first answer 
> >> with the copy from the fstab file wasn’t enough I just wanted to give all 
> >> the details 😁.
> >>
> >> If you create a vm for windows you would probably use NTFS (on top of 
> >> vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> >>
> >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc and 
> >> sdd) for each Linux machine. I don’t use nfs
> >>
> >> Kind regards
> >> Jens
> >>
> >>
> >>
> >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt :
> >>
> >> Jens,
> >>
> >> I don't quite follow the EXT4 usage on top of VMFS but the point here
> >> is you'll ultimately need to truly understand your underlying storage
> >> system and what sorts of guarantees it is giving you.  If linux/the
> >> jvm/nifi think it has a typical EXT4 type block storage system to work
> >> with it can only be safe/operate within those constraints.  I have no
> >> idea about what VMFS brings to the table or the settings for it.
> >>
> >> The sync properties I shared previously might help force the issue of
> >> ensuring a formal sync/flush cycle all the way through the disk has
> >> occurred which we'd normally not do or need to do but again in some
> >> cases offers a stronger guarantee in exchange for performance.
> >>
> >> In any case...Mark's path for you here will help identify what we're
> >> dealing with and we can go from there.
> >>
> >> I am aware of significant usage of NiFi on VMWare configurations
> >> without issue at high rates for many years so whatever it is here is
> >> likely solvable.
> >>
> >> Thanks
> >>
> >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed  
> >> wrote:
> >>
> >>
> >> Hi Mark
> >>
> >>
> >> Thanks for the clarification. I will implement the script when I return to 
> >> the office at Monday next week ( November 1st).
> >>
> >> I don’t use NFS, but ext4. But I will implement the script so we can check 
> >> if it’s the case here. But I think the issue might be after the processors 
> >> writing content to the repository.
> >>
> >> I have a test flow running for more than 2 weeks without any errors. But 
> >> this flow only calculate hash and comparing.
> >>
> >>
> >> Two other flows both create errors. One flow use 
> >> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow use 
> >> MergeContent->UnpackContent->CryptographicHashContent->compares. The last 
> >> flow is totally inside nifi, excluding other network/server issues.
> >>
> >>
> >> In both cases the CryptographicHashContent is right after a process which 
> >> writes new content to the repository. But in one case a file in our 
> >> production flow did calculate a wrong hash 4 times with a 1 minutes delay 
> >> between each calculation. A few hours later I looped the file back and 
> >> this time it was OK.
> >>
> >> Just like the case in step 5 and 12 in the pdf file
> >>
> >>
> >> I will let you all know more later next week
> >>
> >>
> >> Kind regards
> >>
> >> Jens
> >>
> >>
> >>
> >>
> >> Den 27. okt. 2021 kl. 15.43 skrev Mark Payne :
> >>
> >>
> >> And the actual script:
> >>
> >>
> >>
> >> import org.apache.nifi.flowfile.FlowFile
> >>
> >>
> >> import java.util.stream.Collectors
> >>
> >>
> >&

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-10-31 Thread Joe Witt
Jen

118 hours in - still goood.

Thanks

On Fri, Oct 29, 2021 at 10:22 AM Joe Witt  wrote:
>
> Jens
>
> Update from hour 67.  Still lookin' good.
>
> Will advise.
>
> Thanks
>
> On Thu, Oct 28, 2021 at 8:08 AM Jens M. Kofoed  wrote:
> >
> > Many many thanks 🙏 Joe for looking into this. My test flow was running for 
> > 6 days before the first error occurred
> >
> > Thanks
> >
> > > Den 28. okt. 2021 kl. 16.57 skrev Joe Witt :
> > >
> > > Jens,
> > >
> > > Am 40+ hours in running both your flow and mine to reproduce.  So far
> > > neither have shown any sign of trouble.  Will keep running for another
> > > week or so if I can.
> > >
> > > Thanks
> > >
> > >> On Wed, Oct 27, 2021 at 12:42 PM Jens M. Kofoed  
> > >> wrote:
> > >>
> > >> The Physical hosts with VMWare is using the vmfs but the vm machines 
> > >> running at hosts can’t see that.
> > >> But you asked about the underlying file system 😀 and since my first 
> > >> answer with the copy from the fstab file wasn’t enough I just wanted to 
> > >> give all the details 😁.
> > >>
> > >> If you create a vm for windows you would probably use NTFS (on top of 
> > >> vmfs). For Linux EXT3, EXT4, BTRFS, XFS and so on.
> > >>
> > >> All the partitions at my nifi nodes, are local devices (sda, sdb, sdc 
> > >> and sdd) for each Linux machine. I don’t use nfs
> > >>
> > >> Kind regards
> > >> Jens
> > >>
> > >>
> > >>
> > >> Den 27. okt. 2021 kl. 17.47 skrev Joe Witt :
> > >>
> > >> Jens,
> > >>
> > >> I don't quite follow the EXT4 usage on top of VMFS but the point here
> > >> is you'll ultimately need to truly understand your underlying storage
> > >> system and what sorts of guarantees it is giving you.  If linux/the
> > >> jvm/nifi think it has a typical EXT4 type block storage system to work
> > >> with it can only be safe/operate within those constraints.  I have no
> > >> idea about what VMFS brings to the table or the settings for it.
> > >>
> > >> The sync properties I shared previously might help force the issue of
> > >> ensuring a formal sync/flush cycle all the way through the disk has
> > >> occurred which we'd normally not do or need to do but again in some
> > >> cases offers a stronger guarantee in exchange for performance.
> > >>
> > >> In any case...Mark's path for you here will help identify what we're
> > >> dealing with and we can go from there.
> > >>
> > >> I am aware of significant usage of NiFi on VMWare configurations
> > >> without issue at high rates for many years so whatever it is here is
> > >> likely solvable.
> > >>
> > >> Thanks
> > >>
> > >> On Wed, Oct 27, 2021 at 7:28 AM Jens M. Kofoed  
> > >> wrote:
> > >>
> > >>
> > >> Hi Mark
> > >>
> > >>
> > >> Thanks for the clarification. I will implement the script when I return 
> > >> to the office at Monday next week ( November 1st).
> > >>
> > >> I don’t use NFS, but ext4. But I will implement the script so we can 
> > >> check if it’s the case here. But I think the issue might be after the 
> > >> processors writing content to the repository.
> > >>
> > >> I have a test flow running for more than 2 weeks without any errors. But 
> > >> this flow only calculate hash and comparing.
> > >>
> > >>
> > >> Two other flows both create errors. One flow use 
> > >> PutSFTP->FetchSFTP->CryptographicHashContent->compares. The other flow 
> > >> use MergeContent->UnpackContent->CryptographicHashContent->compares. The 
> > >> last flow is totally inside nifi, excluding other network/server issues.
> > >>
> > >>
> > >> In both cases the CryptographicHashContent is right after a process 
> > >> which writes new content to the repository. But in one case a file in 
> > >> our production flow did calculate a wrong hash 4 times with a 1 minutes 
> > >> delay between each calculation. A few hours later I looped the file back 
> > >> and this time it was OK.
> > >>
> > >> Just l

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Joe Witt
61;11926761
>> histogram.62;11927605
>> histogram.63;23858926
>> histogram.64;11929516
>> histogram.65;11930217
>> histogram.66;11930478
>> histogram.67;11939855
>> histogram.68;11927850
>> histogram.69;11931154
>> histogram.7;0
>> histogram.70;11935374
>> histogram.71;11930754
>> histogram.72;11928304
>> histogram.73;11931772
>> histogram.74;11939417
>> histogram.75;11930712
>> histogram.76;1191
>> histogram.77;11931279
>> histogram.78;11928276
>> histogram.79;11930071
>> histogram.8;0
>> histogram.80;11927830
>> histogram.81;11931213
>> histogram.82;11930964
>> histogram.83;11928973
>> histogram.84;11934325
>> histogram.85;11929658
>> histogram.86;11924667
>> histogram.87;11931100
>> histogram.88;11930252
>> histogram.89;11927281
>> histogram.9;11932848
>> histogram.90;11930398
>> histogram.91;0
>> histogram.92;0
>> histogram.93;0
>> histogram.94;11928720
>> histogram.95;11928988
>> histogram.96;0
>> histogram.97;11931423
>> histogram.98;11928181
>> histogram.99;11935549
>> histogram.totalBytes;1073741824
>>
>> File3:
>> histogram.0;0
>> histogram.1;0
>> histogram.10;11930417
>> histogram.100;11926739
>> histogram.101;11930580
>> histogram.102;11928210
>> histogram.103;11935300
>> histogram.104;11925804
>> histogram.105;11931023
>> histogram.106;11932342
>> histogram.107;11929778
>> histogram.108;11930098
>> histogram.109;11930759
>> histogram.11;0
>> histogram.110;11934343
>> histogram.111;11935775
>> histogram.112;11933877
>> histogram.113;11926675
>> histogram.114;11929332
>> histogram.115;11928876
>> histogram.116;11927819
>> histogram.117;11932657
>> histogram.118;11933508
>> histogram.119;11928808
>> histogram.12;0
>> histogram.120;11937532
>> histogram.121;11926907
>> histogram.122;11933942
>> histogram.123;0
>> histogram.124;0
>> histogram.125;0
>> histogram.126;0
>> histogram.127;0
>> histogram.128;0
>> histogram.129;0
>> histogram.13;0
>> histogram.130;0
>> histogram.131;0
>> histogram.132;0
>> histogram.133;0
>> histogram.134;0
>> histogram.135;0
>> histogram.136;0
>> histogram.137;0
>> histogram.138;0
>> histogram.139;0
>> histogram.14;0
>> histogram.140;0
>> histogram.141;0
>> histogram.142;0
>> histogram.143;0
>> histogram.144;0
>> histogram.145;0
>> histogram.146;0
>> histogram.147;0
>> histogram.148;0
>> histogram.149;0
>> histogram.15;0
>> histogram.150;0
>> histogram.151;0
>> histogram.152;0
>> histogram.153;0
>> histogram.154;0
>> histogram.155;0
>> histogram.156;0
>> histogram.157;0
>> histogram.158;0
>> histogram.159;0
>> histogram.16;0
>> histogram.160;0
>> histogram.161;0
>> histogram.162;0
>> histogram.163;0
>> histogram.164;0
>> histogram.165;0
>> histogram.166;0
>> histogram.167;0
>> histogram.168;0
>> histogram.169;0
>> histogram.17;0
>> histogram.170;0
>> histogram.171;0
>> histogram.172;0
>> histogram.173;0
>> histogram.174;0
>> histogram.175;0
>> histogram.176;0
>> histogram.177;0
>> histogram.178;0
>> histogram.179;0
>> histogram.18;0
>> histogram.180;0
>> histogram.181;0
>> histogram.182;0
>> histogram.183;0
>> histogram.184;0
>> histogram.185;0
>> histogram.186;0
>> histogram.187;0
>> histogram.188;0
>> histogram.189;0
>> histogram.19;0
>> histogram.190;0
>> histogram.191;0
>> histogram.192;0
>> histogram.193;0
>> histogram.194;0
>> histogram.195;0
>> histogram.196;0
>> histogram.197;0
>> histogram.198;0
>> histogram.199;0
>> histogram.2;0
>> histogram.20;0
>> histogram.200;0
>> histogram.201;0
>> histogram.202;0
>> histogram.203;0
>> histogram.204;0
>> histogram.205;0
>> histogram.206;0
>> histogram.207;0
>> histogram.208;0
>> histogram.209;0
>> histogram.21;0
>> histogram.210;0
>> histogram.211;0
>> histogram.212;0
>> histogram.213;0
>> histogram.214;0
>> histogram.215;0
>> histogram.216;0
>> histogram.217;0
>> histogram.218;0
>> histogram.219;0
>> histogram.22;0
>> histogram.220;0
>> histogram.221;0
>> histogram.222;0
>> histogram.223;0
>> histogram.224;0
>> histogram.225

Re: CryptographicHashContent calculates 2 differents sha256 hashes on the same content

2021-11-03 Thread Joe Witt
Jens,

I think we're at a loss how to help you specifically then for your
specific installation.  We have attempted to recreate the scenario
with no luck.  We've offered suggestions on experiments which would
help us narrow in but you don't think that will help.

At this point we'll probably have to leave this thread here.  If you
used the forced sync properties we mentioned and it is still happening
then you can pretty much ensure the issue is with the JVM or the
virtual file system mechanism.

Thanks
Joe

On Wed, Nov 3, 2021 at 8:09 AM Jens M. Kofoed  wrote:
>
> Hi Mark
>
> All the files in my testflow are 1GB files. But it happens in my production 
> flow with different file sizes.
>
> When these issues have happened, I have the flowfile routed to an 
> updateAttribute process which is disabled. Just to keep the file in a queue. 
> Enable the process and sent the file back to a new hash calculation, the file 
> is OK. So I don’t think the test with backup and compare makes any sense to 
> do.
>
> Regards
> Jens
>
> > Den 3. nov. 2021 kl. 15.57 skrev Mark Payne :
> >
> > So what I found interesting about the histogram output was that in each 
> > case, the input file was 1 GB. The number of bytes that differed between 
> > the ‘good’ and ‘bad’ hashes was something like 500-700 bytes whose values 
> > were different. But the values ranged significantly. There was no 
> > indication that the type of thing we’ve seen with NFS mounts was happening, 
> > where data was nulled out until received and then updated. If that had been 
> > the case we’d have seen the NUL byte (or some other value) have a very 
> > significant change in the histogram, but we didn’t see that.
> >
> > So a couple more ideas that I think can be useful.
> >
> > 1) Which garbage collector are you using? It’s configured in the 
> > bootstrap.conf file
> >
> > 2) We can try to definitively prove out whether the content on the disk is 
> > changing or if there’s an issue reading the content. To do this:
> >
> > 1. Stop all processors.
> > 2. Shutdown nifi
> > 3. rm -rf content_repository; rm -rf flowfile_repository   (warning, this 
> > will delete all FlowFiles & content, so only do this on a dev/test system 
> > where you’re comfortable deleting it!)
> > 4. Start nifi
> > 5. Let exactly 1 FlowFile into your flow.
> > 6. While it is looping through, create a copy of your entire Content 
> > Repository: cp -r content_repository content_backup1; zip 
> > content_backup1.zip content_backup1
> > 7. Wait for the hashes to differ
> > 8. Create another copy of the Content Repository: cp -r content_repository 
> > content_backup2
> > 9. Find the files within the content_backup1 and content_backup2 and 
> > compare them to see if they are identical. Would recommend comparing them 
> > using each of the 3 methods: sha256, sha512, diff
> >
> > This should make it pretty clear that either:
> > (1) the issue resides in the software: either NiFi or the JVM
> > (2) the issue resides outside of the software: the disk, the disk driver, 
> > the operating system, the VM hypervisor, etc.
> >
> > Thanks
> > -Mark
> >
> >> On Nov 3, 2021, at 10:44 AM, Joe Witt  wrote:
> >>
> >> Jens,
> >>
> >> 184 hours (7.6 days) in and zero issues.
> >>
> >> Will need to turn this off soon but wanted to give a final update.
> >> Looks great.  Given the information on your system there appears to be
> >> something we dont understand related to the virtual file system
> >> involved or something.
> >>
> >> Thanks
> >>
> >>> On Tue, Nov 2, 2021 at 10:55 PM Jens M. Kofoed  
> >>> wrote:
> >>>
> >>> Hi Mark
> >>>
> >>> Of course, sorry :-)  By looking at the error messages, I can see that it 
> >>> is only the histograms which has differences which is listed. And all 3 
> >>> have the first issue at histogram.9. Don't know what that mean
> >>>
> >>> /Jens
> >>>
> >>> Here are the error log:
> >>> 2021-11-01 23:57:21,955 ERROR [Timer-Driven Process Thread-10] 
> >>> org.apache.nifi.processors.script.ExecuteScript 
> >>> ExecuteScript[id=c7d3335b-1045-14ed--a0d62c70] There are 
> >>> differences in the histogram
> >>> Byte Value: histogram.10, Previous Count: 11926720, New Count: 11926721
> >>> Byte Value: histogram.100, Previous Count: 11927504, New Count: 11927503
> >>> Byte Value: histogram.101, Previ

[ANNOUNCE] Apache NiFi 1.15.0 release

2021-11-08 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.15.0.

This is an important feature bearing release bringing things like
parameter context
inheritance, component validation against external resources, ability
to run stateless
nifi flows within nifi itself, and far more.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12350382

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.15.0

Thank you
The Apache NiFi team


Re: NiFi Fails to Reconnect to Zookeeper After an Outage

2021-11-30 Thread Joe Witt
Shawn,

I'm not aware of any specific action.  Can you please file a JIRA with
as much detail as possible?

Thanks

On Tue, Nov 30, 2021 at 7:58 AM Shawn Weeks  wrote:
>
> Does anyone know if the patch to the curator library ever made it into NiFi? 
> Still seeing this issue where once NiFi has lost it’s connecting to Zookeeper 
> it will never recover it and thus never reconnect to the cluster.
>
>
>
> Thanks
>
> Shawn
>
>
>
> From: Shawn Weeks 
> Sent: Wednesday, September 15, 2021 1:12 PM
> To: users@nifi.apache.org
> Subject: NiFi Fails to Reconnect to Zookeeper After an Outage
>
>
>
> Had a Zookeeper cluster go down and after things came back up NiFi seemed 
> stuck and wouldn’t ever reestablish the cluster. The following error was 
> repeated on the node that wouldn’t rejoin. Googling the first message 
> mentions a bug in the curator library that causes it to never reconnect to 
> Zookeeper after an issue. See 
> https://issues.apache.org/jira/browse/CURATOR-405 for an example. This is on 
> 1.14.0 against Zookeeper 3.6.3
>
>
>
> 2021-09-15 18:03:20,644 WARN [Curator-ConnectionStateManager-0] 
> o.a.c.f.state.ConnectionStateManager Session timeout has elapsed while 
> SUSPENDED. Injecting a session expiration. Elapsed ms: 1. Adjusted 
> session timeout ms: 1
>
> 2021-09-15 18:03:25,201 WARN [Clustering Tasks Thread-2] 
> o.apache.nifi.controller.FlowController Failed to send heartbeat due to: 
> org.apache.nifi.cluster.protocol.ProtocolException: Cannot send heartbeat 
> because there is no Cluster Coordinator currently elected
>
>
>
>


Re: Nifi 1.14 user authentication using openId connect not working

2021-12-08 Thread Joe Witt
dropped dev.  bcc ganesh

Ganesh

Please do not email more than one mailing list. Also you received a great
reply.
Ill forward it to you.

Thanks

On Wed, Dec 8, 2021 at 9:30 PM Ganesh, B (Nokia - IN/Bangalore) <
b.gan...@nokia.com> wrote:

> Hi ,
>
>
>
> Currently we have blocked due to this …… could you please help us to
> resolve the below issue .
>
>
>
> *From:* Ganesh, B (Nokia - IN/Bangalore)
> *Sent:* Wednesday, December 8, 2021 4:24 PM
> *To:* users@nifi.apache.org; d...@nifi.apache.org
> *Subject:* Nifi 1.14 user authentication using openId connect not working
>
>
>
> Hi ,
>
>
>
> We are using apache nifi 1.14 .  We have 3 nodes in nifi cluster , cluster
> is using external zookeeper for state management.
>
> We are using openId connect for the user authentication . following are
> the relevant configuration in nifi.properties file .
>
> *nifi.security.user.authorizer=managed-authorizer*
>
> *nifi.security.allow.anonymous.authentication=false*
>
> *nifi.security.user.login.identity.provider=*
>
> *.*
>
> *………..*
>
> *# OpenId Connect SSO Properties #*
>
> *nifi.security.user.oidc.discovery.url=https:// SERVER>/access/realms/nifi/.well-known/openid-configuration*
>
> *nifi.security.user.oidc.connect.timeout=5 secs*
>
> *nifi.security.user.oidc.read.timeout=5 secs*
>
> *nifi.security.user.oidc.client.id
> =nifi-client*
>
> *nifi.security.user.oidc.client.secret=*
>
> *nifi.security.user.oidc.preferred.jwsalgorithm=RS256*
>
>
>
> *But we are observing *
>
> *org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
> Unsatisfied dependency expressed through method
> 'setFilterChainProxySecurityConfigurer' parameter 1; nested exception is
> org.springframework.beans.factory.BeanExpressionException: Expression
> parsing failed; nested exception is
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': Unsatisfied
> dependency expressed through method 'setJwtAuthenticationProvider'
> parameter 0; nested exception is
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name 'jwtAuthenticationProvider' defined in class path resource
> [nifi-web-security-context.xml]: Cannot resolve reference to bean
> 'authorizer' while setting constructor argument; nested exception is
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name 'authorizer': FactoryBean threw exception on object
> creation; nested exception is java.lang.reflect.InvocationTargetException*
>
> *at
> org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.resolveMethodArguments(AutowiredAnnotationBeanPostProcessor.java:768)*
>
> *at
> org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredMethodElement.inject(AutowiredAnnotationBeanPostProcessor.java:720)*
>
> *at
> org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:119)*
>
> *at
> org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessProperties(AutowiredAnnotationBeanPostProcessor.java:399)*
>
> *at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1413)*
>
> *at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:601)*
>
> *at
> org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:524)*
>
> *at
> org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335)*
>
> *at
> org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234)*
>
> *at
> org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333)*
>
> *at
> org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)*
>
> *at
> org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:944)*
>
> *at
> org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)*
>
> *at
> org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583)*
>
> *at
> org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:401)*
>
> *at
> org.springframework.web.context.ContextLoader.

Re: Nifi 1.14 vulnerabilities critical

2021-12-12 Thread Joe Witt
Ganesh

You and/or another person in your email were replied to already on the
proper alias which is the security alias.

In any event since we are now also here we will share the same message

Me: We regularly perform such scans as well.  If we confirm we use a
vulnerable library in a way that exposes the vulnerability we act quickly
to resolve.  We generally do not backport to older lines and instead
continually improve the release going forward.  The current release is 1.15
and we are working on 1.16.

Apache Security/Mark: Outdated dependencies are not always security
issues.  A project would only be affected if a dependency was used in such
a way that the affected underlying code is used and the vulnerabilities
were exposed.  We typically get reports sent to us from scanning tools that
looks at dependencies out of context on how they are actually used in the
projects.  As such we reject these reports and suggest you either a) show
how the product is affected by the dependency vulnerabilities, or b) simply
mention this as a normal bug report to that project.  Since dependency
vulnerabilities are quite public, there is no need to use this private
reporting mechanism for them.

Thanks

On Sun, Dec 12, 2021 at 10:08 PM Ganesh, B (Nokia - IN/Bangalore) <
b.gan...@nokia.com> wrote:

> Hi ,
>
>  As part of upgrade from nifi-1.13.2 to nifi-1.14.0  we performed scans on
> nifi 1.14.0 and as a result there are few critical and high vulnerabilities
> .
>
> Critical vulnerabilities
>
> *Vulnerability Id*
>
> *Severity *
>
> *path*
>
> *Fix available  *
>
> *Link *
>
> CVE-2017-7657
>
> Critical
>
> /opt/nifi/lib/jetty-schemas-3.1.jar
>
> None
>
> NVD - CVE-2017-7657 (nist.gov)
> 
>
> CVE-2017-7658
>
> Critical
>
> /opt/nifi/lib/jetty-schemas-3.1.jar
>
> None
>
> https://nvd.nist.gov/vuln/detail/CVE-2017-7658
>
> CVE-2019-12415
>
> Critical
>
> /opt/nifi/lib/nifi-nar-utils-1.14.0.jar
>
> None
>
>
> https://anchore.int.net.nokia.com:443/v1/query/vulnerabilities?id=VULNDB-216029
>
>
>
> High Vulnerabilities
>
>
>
> *Vulnerability Id*
>
> *Severity *
>
> *path*
>
> *Fix available  *
>
> *Link *
>
> CVE-2017-7656
>
> High
>
> /opt/nifi/lib/jetty-schemas-3.1.jar
>
> None
>
> https://nvd.nist.gov/vuln/detail/CVE-2009-5045
>
> CVE-2017-9735
>
> High
>
> /opt/nifi/lib/jetty-schemas-3.1.jar
>
> None
>
> https://nvd.nist.gov/vuln/detail/CVE-2017-9735
>
> CVE-2020-27216
>
> High
>
> /opt/nifi/lib/jetty-schemas-3.1.jar
>
> None
>
> https://nvd.nist.gov/vuln/detail/CVE-2020-27216
>
> VULNDB-256815
>
> High
>
> /opt/nifi-toolkit/lib/commons-compress-1.20.jar
>
> None
>
> https://repo1.dso.mil/dsop/opensource/apache/nifi/-/issues/13
>
> VULNDB-257084
>
> High
>
> /opt/nifi-toolkit/lib/commons-compress-1.20.jar
>
> None
>
> https://repo1.dso.mil/dsop/opensource/apache/nifi/-/issues/13
>
>
>
>
>
> One or two vulnerabilities are fixed in 1.15 example CVE-2020-17521 :
> https://issues.apache.org/jira/browse/NIFI-8990.
>
>
>
> Could you please help us the impact and fix version or possibility of
> fixing in 1.14 it self ?
>
>
>
> Thanks & Regards,
>
> Ganesh.B
>
>
>


Re: Log4j Patch Util

2021-12-15 Thread Joe Witt
Bryan

This type of approach would work generally quite fine.  Did you paste
the link you intended or did you forget to link to the patch?

Thanks

On Wed, Dec 15, 2021 at 12:01 PM Bryan Rosander  wrote:
>
> Hey all,
>
> I wrote up a utility to patch all nars in a given NiFi install to remove 
> JndiLookup.class from log4j jars.  It has no dependencies and the single file 
> can be compiled and run as-is.
>
> It looks like it should be handled pretty well if the class is just missing 
> since they didn't expect it to be available on Android. [1]
>
> It does not attempt to update already unpacked nars so I'd suggest stopping 
> NiFi and removing the work/nar directory before running.
>
> Usage:
>
> 1. Put by itself in a directory
> 2. Compile 'javac Log4jPatch.java'
> 3. Run 'java Log4jPatch'
>
> Verify (optionally do before patch to validate that the grep pattern works, 
> you have the vulnerable class file):
>
> 1. Start NiFi, wait for it to unpack all nars.
> 2. Run this in NIFI_HOME: 'find . -iname "*log4j*" | xargs grep -i 
> jndilookup.class'
>
> I'm looking for feedback around the approach.  Anyone's free to take this and 
> use it how they want to.
>
> Thanks,
> Bryan
>
> [1] 
> https://github.com/apache/logging-log4j2/blob/rel/2.8.2/log4j-core/src/main/java/org/apache/logging/log4j/core/lookup/Interpolator.java#L100-L106


Re: Log4j Patch Util

2021-12-15 Thread Joe Witt
Bryan

You did it right - i was just a dope and didn't scroll down far enough
:). The link is a good call though too.

I thought the list blocked attachments actually.

Anyway thanks for sharing that.  It is an option for folks to consider.

Thanks

On Wed, Dec 15, 2021 at 12:17 PM Bryan Rosander  wrote:
>
> Hey Joe,
>
> Sorry if I didn't attach it properly.  The archive client seems to see it [1]
>
> I created a gist in case something else is wrong. [2]
>
> Thanks,
> Bryan
>
> [1] https://lists.apache.org/thread/v8ydn3bgkgspf2vh8j0d0zygzdkwb7k0
> [2] https://gist.github.com/brosander/a6f5075535772c60605c1544a91d56f5
>
> On Wed, Dec 15, 2021 at 2:06 PM Joe Witt  wrote:
>>
>> Bryan
>>
>> This type of approach would work generally quite fine.  Did you paste
>> the link you intended or did you forget to link to the patch?
>>
>> Thanks
>>
>> On Wed, Dec 15, 2021 at 12:01 PM Bryan Rosander  
>> wrote:
>> >
>> > Hey all,
>> >
>> > I wrote up a utility to patch all nars in a given NiFi install to remove 
>> > JndiLookup.class from log4j jars.  It has no dependencies and the single 
>> > file can be compiled and run as-is.
>> >
>> > It looks like it should be handled pretty well if the class is just 
>> > missing since they didn't expect it to be available on Android. [1]
>> >
>> > It does not attempt to update already unpacked nars so I'd suggest 
>> > stopping NiFi and removing the work/nar directory before running.
>> >
>> > Usage:
>> >
>> > 1. Put by itself in a directory
>> > 2. Compile 'javac Log4jPatch.java'
>> > 3. Run 'java Log4jPatch'
>> >
>> > Verify (optionally do before patch to validate that the grep pattern 
>> > works, you have the vulnerable class file):
>> >
>> > 1. Start NiFi, wait for it to unpack all nars.
>> > 2. Run this in NIFI_HOME: 'find . -iname "*log4j*" | xargs grep -i 
>> > jndilookup.class'
>> >
>> > I'm looking for feedback around the approach.  Anyone's free to take this 
>> > and use it how they want to.
>> >
>> > Thanks,
>> > Bryan
>> >
>> > [1] 
>> > https://github.com/apache/logging-log4j2/blob/rel/2.8.2/log4j-core/src/main/java/org/apache/logging/log4j/core/lookup/Interpolator.java#L100-L106


[ANNOUNCE] Apache NiFi 1.15.1 release

2021-12-15 Thread Joe Witt
Hello

The Apache NiFi team would like to announce the release of Apache NiFi 1.15.1.

This is a bug, improvement, and security focused release.  The primary
intent is a
prompt release which ensures we no longer use any log4j 1.x or 2.x prior to 2.16
artifacts and we also update to the latest logback.  But there are a
host of other
bugs and improvements included.

Apache NiFi is an easy to use, powerful, and reliable system to
process and distribute
data.  Apache NiFi was made for dataflow.  It supports highly
configurable directed graphs
of data routing, transformation, and system mediation logic.

More details on Apache NiFi can be found here:
https://nifi.apache.org/

The release artifacts can be downloaded from here:
https://nifi.apache.org/download.html

Maven artifacts have been made available and mirrored as per normal
ASF artifact processes.

Issues closed/resolved for this list can be found here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12351055

Release note highlights can be found here:
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.15.1

Thank you
The Apache NiFi team


Re: Flowfile disk space is not released from the content-repository until the entire dataflow is completed

2021-12-20 Thread Joe Witt
Vijay

nifi.content.repository.archive.max.retention.period=6 hours
nifi.content.repository.archive.max.usage.percentage=40%

Did you actually run out of disk space?  What error did you get?

We do remove content from the flow file repository when there is no
longer an active flow file that points at that version of content AND
when we need to free up space.

What version are you using?

Thanks

On Mon, Dec 20, 2021 at 10:55 AM Vijay Chhipa  wrote:
>
> Hi all,
>
> We have a use case where we list out the contents of a website and then 
> download each item in the list and process it.
> What I expected is that when each item (a file) is downloaded, after 
> processing is completed, and the flowfile is not in any of the queues the 
> disk storage will be released. But what I see is the content-repo size 
> continues to increase as the files are processed. If I pause the flow for 
> several hours (over 24 hours) the repo size stays at the increased level and 
> does not go down. Only when I clear all the queues does the content-repo size 
> goes down to the original size (before the flow started).
>
> I am not using provenance and have disabled it.
> Here is the relevant section of the properties file.
>
> I would have been okay with it but I need to process over 200K files each in 
> size almost 1GB.
>
> What is holding reference to these processed flow files and how can I design 
> the dataflow to not have the content repo filled up.
>
> nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
> nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
> nifi.flowfile.repository.directory=/var/foo/bar/flowfile_repository
> nifi.flowfile.repository.partitions=256
> nifi.flowfile.repository.checkpoint.interval=2 mins
> nifi.flowfile.repository.always.sync=false
>
> # Content Repository
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
> nifi.content.claim.max.appendable.size=1 MB
> nifi.content.claim.max.flow.files=10
> nifi.content.repository.directory.default=/var/foo/bar/content_repository
> nifi.content.repository.archive.max.retention.period=6 hours
> nifi.content.repository.archive.max.usage.percentage=40%
> nifi.content.repository.archive.enabled=false
> nifi.content.repository.always.sync=false
> nifi.content.viewer.url=../nifi-content-viewer/
>
> # Provenance Repository Properties
> nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository
> nifi.provenance.repository.debug.frequency=1_000_000
> nifi.provenance.repository.encryption.key.provider.implementation=
> nifi.provenance.repository.encryption.key.provider.location=
> nifi.provenance.repository.encryption.key.id=
> nifi.provenance.repository.encryption.key=
>
> # Persistent Provenance Repository Properties
> nifi.provenance.repository.directory.default=/var/foo/bar/provenance_repository
> nifi.provenance.repository.max.storage.time=24 hours
> nifi.provenance.repository.max.storage.size=1 GB
> nifi.provenance.repository.rollover.time=30 secs
> nifi.provenance.repository.rollover.size=100 MB
> nifi.provenance.repository.query.threads=2
> nifi.provenance.repository.index.threads=2
> nifi.provenance.repository.compress.on.rollover=true
> nifi.provenance.repository.always.sync=false
>
>
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
> ProcessorID, Relationship
>
> nifi.provenance.repository.indexed.attributes=
>
> nifi.provenance.repository.index.shard.size=500 MB
> nifi.provenance.repository.max.attribute.length=65536
> nifi.provenance.repository.concurrent.merge.threads=2
>
> nifi.provenance.repository.warm.cache.frequency=1 hour
> nifi.provenance.repository.buffer.size=10
>
> Thanks
> Vijay


Re: Nifi 1.15.1 RPM issue

2021-12-21 Thread Joe Witt
Greg

I dont quite follow what the diff shows but someone needs to actively
maintain the RPM config as the config for the assembly itself is done.
This tends to drift over time as the RPM gets much less attention.

We'll need someone to look into it.  If there isn't a JIRA please do file one.

Thanks

On Tue, Dec 21, 2021 at 12:14 PM Gregory M. Foreman
 wrote:
>
> I did the following:
>
> cd /opt/nifi/nifi-1.15.1/
> mv lib lib.bak
> cp -R 
> ~/sandbox/nifi-1.15.1-build/nifi-1.15.1/nifi-assembly/target/nifi-1.15.1-bin/nifi-1.15.1/lib
>  .
> bin/nifi.sh start
>
> and the server started fine.  Any insights into why the libs would differ?  
> The diffs between the two directories are attached.
>
> Thanks,
> Greg
>
>
> On Dec 21, 2021, at 12:16 PM, Gregory M. Foreman 
>  wrote:
>
> Hello Edward:
>
> SELinux is running in Permissive mode.
>
> Greg
>
> On Dec 18, 2021, at 8:15 AM, Edward Armes  wrote:
>
> Hi Greg,
>
> Can you confirm when you deploy via the RPM and start Nifi that selinux is 
> either running in permissive mode or is disabled
>
> Edward
>
> On Fri, 17 Dec 2021, 20:55 Gregory M. Foreman, 
>  wrote:
>>
>> David:
>>
>> Correct, only the RPM version.  Running nifi.sh start in the regular build 
>> output directory works fine.
>>
>> Greg
>>
>>
>> On Dec 17, 2021, at 3:38 PM, David Handermann  
>> wrote:
>>
>> Gregory,
>>
>> Thanks for the confirmation. So this issue is specific to the RPM build, 
>> correct?  NiFi starts correctly using the tar.gz binary?
>>
>> Regards,
>> David Handermann
>>
>> On Fri, Dec 17, 2021 at 2:23 PM Gregory M. Foreman 
>>  wrote:
>>>
>>> David:
>>>
>>> No modifications were made to the sources or configurations.  I executed 
>>> nifi.sh start and that was all.
>>>
>>> Thanks,
>>> Greg
>>>
>>> On Dec 17, 2021, at 3:14 PM, David Handermann  
>>> wrote:
>>>
>>> Gregory,
>>>
>>> Thanks for reporting this issue.  Do you have any notification services 
>>> configured as part of the bootstrap.conf, such as the HTTP notification 
>>> service?
>>>
>>> Regards,
>>> David Handermann
>>>
>>> On Fri, Dec 17, 2021 at 1:56 PM Gregory M. Foreman 
>>>  wrote:

 Hello:

 I am having trouble with the Nifi 1.15.1 RPM.  The generated code runs 
 when executed directly, but the RPM deployed version does not run.  I did 
 note a significant difference between.  Build output is attached.

 

 mvn -version
 Apache Maven 3.8.4 (9b656c72d54e5bacbed989b64718c159fe39b537)
 Maven home: /opt/maven
 Java version: 1.8.0_312, vendor: Red Hat, Inc., runtime: 
 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-1.el7_9.x86_64/jre
 Default locale: en_US, platform encoding: UTF-8
 OS name: "linux", version: "3.10.0-1160.49.1.el7.x86_64", arch: "amd64", 
 family: "unix"

 mvn clean install -Prpm -DskipTests

 yum localinstall 
 nifi-assembly/target/rpm/nifi-bin/RPMS/noarch/nifi-1.15.1-1.el7.noarch.rpm

 /opt/nifi/nifi-1.15.1/bin/nifi.sh start
 nifi.sh: JAVA_HOME not set; results may vary

 Java home:
 NiFi home: /opt/nifi/nifi-1.15.1

 Bootstrap Config File: /opt/nifi/nifi-1.15.1/conf/bootstrap.conf

 Exception in thread "main" java.lang.NoClassDefFoundError: 
 org/apache/nifi/security/util/TlsConfiguration
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
 at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
 at 
 org.apache.nifi.bootstrap.util.SecureNiFiConfigUtil.configureSecureNiFiProperties(SecureNiFiConfigUtil.java:124)
 at org.apache.nifi.bootstrap.RunNiFi.start(RunNiFi.java:1247)
 at org.apache.nifi.bootstrap.RunNiFi.main(RunNiFi.java:289)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.nifi.security.util.TlsConfiguration
 at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
 ... 15 more


>>>
>>
>
>


<    5   6   7   8   9   10   11   12   >