Re: initiating a machine learning script on a remote server

2020-06-25 Thread Darren Govoni
Quick answer is you could just execute a ssh command to execute on the remote 
machine.

If you need flowfiles to go remote, nifi supports remote processor groups.

Sent from my Verizon, Samsung Galaxy smartphone



initiating a machine learning script on a remote server

2020-06-25 Thread Mike Sofen
I've been prototyping various functionality on nifi, initially on a Windows
laptop, now on a single GCP Linux instance (for now), using the more basic
processors for files and databases.  It's really a superb platform.

 

What I now need to solve for is firing a python machine learning script that
exists on another CPU/GPU equipped instance, as part of a pipeline that
detects a new file to process, sends the file name/location to the remote
server and receives the results of the processing from the server, for
further actions.  We need maximum performance and robustness from this step
of the processing.

 

I've read a bunch of posts on this and they point to using the
ExecuteStreamCommand processor (vs the ExecuteProcess, since it allows
inputs) but none seem show how to configure the processor to point to a
remote server and execute a script that exists on that server with
arguments/variables I pass in with the call.  These servers will all be GCP
instances. To keep things simple, let's ignore security for the moment and
assume I own both servers.

 

Can someone point me in the right direction? Many thanks!

 

Mike Sofen



Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread Andy LoPresto
The resulting flowfile will always have at least two attributes because the 
whole match is extracted as an attribute and every capture group is extracted 
as an attribute, and the expression must contain at least one capture group. 

What is the objective you are trying to accomplish? If you want to route 
flowfiles based on their text contents, you can use RouteText. If you want to 
extract text content to attributes, use ExtractText. 

The use case you described above basically retrieves a log file from blob 
storage, splits each file to individual lines, extracts the content of each 
line (minus the final character) into an attribute, and then sends the values 
to Syslog. 

You may want to look at the record processors to improve the performance and 
simplicity of the flow substantially. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 25, 2020, at 11:53 AM, muhyid72  wrote:
> 
> Hi Andy,
> 
> Thank you for your quick answer and interest. 
> 
> Actually I tried that but there were still 2 attributes on the flow file. As
> far as I understand it is by design, I can't set just one attribute, it has
> at least 2. Am i right?
> 
> Can I use Route Text Processor instead of Extract Text (I have given my
> Extract Text configuration at the above) Dou you have comment?
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread muhyid72
Hi Andy,

Thank you for your quick answer and interest. 

Actually I tried that but there were still 2 attributes on the flow file. As
far as I understand it is by design, I can't set just one attribute, it has
at least 2. Am i right?

Can I use Route Text Processor instead of Extract Text (I have given my
Extract Text configuration at the above) Dou you have comment?



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Replacing a base64-encoded field in a JSON-document with its decoded/converted value

2020-06-25 Thread Andy LoPresto
Hi Bjørn,

No, XML to JSON conversion is not an Expression Language feature. You’ll need 
to either get this data into a flowfile as the complete content to perform the 
conversion with existing built-in tools, or add that step to your Groovy 
script. 

With that additional requirement, I think using the Groovy script to perform 
those steps in tandem is probably the most performant and logical approach 
here. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 24, 2020, at 11:25 PM, Myklebust, Bjørn Magnar 
>  wrote:
> 
> Thanks Andy.
> The XML-content is around 5 kB-ish.  But I also need to convert the XML to 
> JSON before replacing it back into the original JSON-file.  Can this be done 
> with e.g a ConvertAttribute before the ReplaceText?
>  
> Thanks,
> Bjørn
>  
>  
>  
> Fra: Andy LoPresto  
> Sendt: onsdag 24. juni 2020 17:24
> Til: users@nifi.apache.org
> Emne: Re: Replacing a base64-encoded field in a JSON-document with its 
> decoded/converted value
>  
> Hello Bjørn, 
>  
> If the size of the encoded XML document is small (under ~1 KB), you can 
> extract the Base64-encoded value to a flowfile attribute using 
> EvaluateJSONPath, perform the decoding using the base64Decode Expression 
> Language function [1], and then replace it into the flowfile JSON content 
> using ReplaceText (using some regex like "content": ".*" -> “content": 
> ”${decodedXML}” where decodedXML is the name of the attribute you are using). 
>  
> If the XML content could be very large, this will negatively affect your 
> performance, as attributes are stored directly in memory and handling large 
> amounts of data will impact the heap. In this case, I would recommend writing 
> a Groovy script in ExecuteScript processor to leverage Groovy’s very friendly 
> JSON handling and extract the value, Base64 decode it, and replace it in a 
> couple lines. 
>  
> Hope this helps. 
>  
>  
> [1] 
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#base64decode
>  
> 
>  
> Andy LoPresto
> alopre...@apache.org 
> alopresto.apa...@gmail.com 
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
> 
> On Jun 24, 2020, at 4:24 AM, Myklebust, Bjørn Magnar 
> mailto:bjorn.mykleb...@skatteetaten.no>> 
> wrote:
>  
>  
> Hi.
> I have a set of Json-files which contain a base64-coded field (Jsonpath to 
> this field is $.data.content), and this field contains a XML-document.  
> Decoding the field works as expected, so does the conversion from xml to 
> json,  and I'm able to write the content from this field to a file in a 
> bucket in S3.  But what I would like to do is to be able to replace the coded 
> value for this field in the original file with the decoded/converted value in 
> stead of writing the decoded/converted value to file. And after replacing the 
> json-value then I can write the updated Json-file to a new S3 bucket.
> My process look like this at the moment, and works fine for getting the data 
> to file, but it's missing the last part of replacing $.data.content with the 
> decoded/converted data.
>  
> So how can I do the last part?
>  
> 
>  
> The EvaluedJsonPath looks like this:
>  
> 
> 
>  
> The ReplaceText looks like this:
>  
> 
> 
> The Base64EncodeContent looks like this:
>  
> 
> 
> and finally, the CovertRecord looks like this:
>  
> 
> 
>  
>  
> This is a testfile for that I'm working with:
>  
> {
>   "header": {
> "dokumentidentifikator": null,
> "dokumentidentifikatorV2": "dcff985b-c652-4085-b8f1-45a2f4b6d150",
> "revisjonsnummer": 1,
> "dokumentnavn": 
> "Engangsavgiftfastsettelse:55TEST661122334455:44BIL1:2017-10-20",
> "dokumenttype": "SKATTEMELDING_ENGANGSAVGIFT",
> "dokumenttilstand": "OPPRETTET",
> "gyldig": true,
> "gjelderInntektsaar": 2017,
> "gjelderPeriode": "2017_10",
> "gjelderPart": {
>   "partsnummer": 5544332211,
>   "identifiseringstype": "MASKINELL",
>   "identifikator": null
> },
> "opphavspart": {
>   "partsnummer": 5544332211,
>   "identifikator": null
> },
> "kildereferanse": {
>   "kildesystem": "ENGANGSAVGIFTFASTSETTELSE",
>   "gruppe": "",
>   "referanse": "aef147fb-8ce8-43ef-833b-7aa3bac1ece0",
>   "tidspunkt": "2018-01-16T13:28:02.49+01:00"
> }
>   },
>   "data": {
> "metadata": {
>   "format": "ske:fastsetting:motorvogn:motorvognavgift:v1",
>   "bytes": 4420,
>   "mimeType": "application/xml",
>   "sha1": "c0AowOsTdNdo6VufeSsZqTphc0Y="
> },
> "content": 
> 

Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread Andy LoPresto
The regex you’re using contains a capture group, and so the entire string is 
captured as one attribute, and then the contained capture groups are also 
extracted as attributes. You can set the property “Include Capture Group 0” to 
false to remove one of them. The others are provided as expected. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 25, 2020, at 8:27 AM, muhyid72  wrote:
> 
> Dear All
> I need an information about Flow Files Attribute of Extract Text Processor. 
> My flow is that;
> 
> 1. Getting IIS Log files from Azure Blob Storage 
> 2. Splitting each IIS Log File to line by line with Split Text Processor. 
> 2.1. Line Split Count:1
> 2.2. Maximum Fragment Size: No value set
> 2.3. Header Line Count: 0
> 2.4. Header Line Marker Characters: No value set
> 2.5. Remove Trailing Newlines: True
> 3. Transferring new flow files which is produced by Split Text Processor to
> Extract Text Processor. 
> 3.1. All Properties are Default
> 3.2. I added one RegEx in the Properties. I would like to carry on Flow
> Files attributes to Syslog
> 3.2.1. Property Name: msg 
> 3.2.2. Value: (.*). 
> 4. Transferring all flow files where is coming from Extract Text to Put
> Syslog Processor. 
> 4.1. All Properties are Default or configured properly for requirements
> (such as IP address of the Syslog, port etc.) 
> 4.2. Message Body: IISHttp${msg}
> 
> When I check Flow Files Attribute from Data Provenance in the Extract Text
> Processor, I see 3 attributes same each other. 
> Msg: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> Msg.1: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> Msg.2: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> 
> How can I remove duplicate attributes from extract text output? Or I need to
> use another way?
> Do you have any comment or suggestion?
> 
> My environment details are below:
> Apache NiFi 1.11.3
> Windows Server 2016
> Java JRE 1.8.0_241 (64 Bit)
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Re: NiFi 1.11 - non-heap size

2020-06-25 Thread Valentina Ivanova
I see. Thanks again!

Valentina

From: Joe Witt 
Sent: Thursday, 25 June 2020 17:54
To: users@nifi.apache.org 
Subject: Re: NiFi 1.11 - non-heap size

Since the max is undefined if more memory is needed the size will grow and the 
current usage may ebb and flow thus leaving a difference which will/should 
likely always be relatively small.  But things look good/healthy in terms of 
mem on that system as of now

On Thu, Jun 25, 2020 at 8:51 AM Valentina Ivanova 
mailto:valentina.ivan...@ri.se>> wrote:
Hi Joe,

Thanks for the quick reply!

What about the used and free non-heap - free non-heap is 14 MB only while used 
is 224MB?

Thanks

Valentina

From: Joe Witt mailto:joe.w...@gmail.com>>
Sent: Thursday, 25 June 2020 17:35
To: users@nifi.apache.org 
mailto:users@nifi.apache.org>>
Subject: Re: NiFi 1.11 - non-heap size

-1 in that case means it is on a system or configuration for which the JVM 
cannot get that answer.

Nothing to worry about.

On Thu, Jun 25, 2020 at 8:33 AM Valentina Ivanova 
mailto:valentina.ivan...@ri.se>> wrote:
Hello again!

I have been looking at the system diagnostics and noticed the values for 
non-heap size (screenshot attached). I am running NiFi 1.11 with openjdk 
version 1.8.0_181. It seems I have too little non-heap memory free and the max 
value is set to -1 (which seems strange). Shall I be concerned with these 
values and what can I do in order to improve them?

Many thanks

Valentina


Re: NiFi 1.11 - non-heap size

2020-06-25 Thread Joe Witt
Since the max is undefined if more memory is needed the size will grow and
the current usage may ebb and flow thus leaving a difference which
will/should likely always be relatively small.  But things look
good/healthy in terms of mem on that system as of now

On Thu, Jun 25, 2020 at 8:51 AM Valentina Ivanova 
wrote:

> Hi Joe,
>
> Thanks for the quick reply!
>
> What about the used and free non-heap - free non-heap is 14 MB only while
> used is 224MB?
>
> Thanks
>
> Valentina
> --
> *From:* Joe Witt 
> *Sent:* Thursday, 25 June 2020 17:35
> *To:* users@nifi.apache.org 
> *Subject:* Re: NiFi 1.11 - non-heap size
>
> -1 in that case means it is on a system or configuration for which the JVM
> cannot get that answer.
>
> Nothing to worry about.
>
> On Thu, Jun 25, 2020 at 8:33 AM Valentina Ivanova 
> wrote:
>
> Hello again!
>
> I have been looking at the system diagnostics and noticed the values for
> non-heap size (screenshot attached). I am running NiFi 1.11 with openjdk
> version 1.8.0_181. It seems I have too little non-heap memory free and the
> max value is set to -1 (which seems strange). Shall I be concerned with
> these values and what can I do in order to improve them?
>
> Many thanks
>
> Valentina
>
>


Re: NiFi 1.11 - non-heap size

2020-06-25 Thread Valentina Ivanova
Hi Joe,

Thanks for the quick reply!

What about the used and free non-heap - free non-heap is 14 MB only while used 
is 224MB?

Thanks

Valentina

From: Joe Witt 
Sent: Thursday, 25 June 2020 17:35
To: users@nifi.apache.org 
Subject: Re: NiFi 1.11 - non-heap size

-1 in that case means it is on a system or configuration for which the JVM 
cannot get that answer.

Nothing to worry about.

On Thu, Jun 25, 2020 at 8:33 AM Valentina Ivanova 
mailto:valentina.ivan...@ri.se>> wrote:
Hello again!

I have been looking at the system diagnostics and noticed the values for 
non-heap size (screenshot attached). I am running NiFi 1.11 with openjdk 
version 1.8.0_181. It seems I have too little non-heap memory free and the max 
value is set to -1 (which seems strange). Shall I be concerned with these 
values and what can I do in order to improve them?

Many thanks

Valentina


Re: NiFi 1.11 - non-heap size

2020-06-25 Thread Joe Witt
-1 in that case means it is on a system or configuration for which the JVM
cannot get that answer.

Nothing to worry about.

On Thu, Jun 25, 2020 at 8:33 AM Valentina Ivanova 
wrote:

> Hello again!
>
> I have been looking at the system diagnostics and noticed the values for
> non-heap size (screenshot attached). I am running NiFi 1.11 with openjdk
> version 1.8.0_181. It seems I have too little non-heap memory free and the
> max value is set to -1 (which seems strange). Shall I be concerned with
> these values and what can I do in order to improve them?
>
> Many thanks
>
> Valentina
>


NiFi 1.11 - non-heap size

2020-06-25 Thread Valentina Ivanova
Hello again!

I have been looking at the system diagnostics and noticed the values for 
non-heap size (screenshot attached). I am running NiFi 1.11 with openjdk 
version 1.8.0_181. It seems I have too little non-heap memory free and the max 
value is set to -1 (which seems strange). Shall I be concerned with these 
values and what can I do in order to improve them?

Many thanks

Valentina


Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread muhyid72
Dear All
I need an information about Flow Files Attribute of Extract Text Processor. 
My flow is that;

1. Getting IIS Log files from Azure Blob Storage 
2. Splitting each IIS Log File to line by line with Split Text Processor. 
2.1. Line Split Count:1
2.2. Maximum Fragment Size: No value set
2.3. Header Line Count: 0
2.4. Header Line Marker Characters: No value set
2.5. Remove Trailing Newlines: True
3. Transferring new flow files which is produced by Split Text Processor to
Extract Text Processor. 
3.1. All Properties are Default
3.2. I added one RegEx in the Properties. I would like to carry on Flow
Files attributes to Syslog
3.2.1. Property Name: msg 
3.2.2. Value: (.*). 
4. Transferring all flow files where is coming from Extract Text to Put
Syslog Processor. 
4.1. All Properties are Default or configured properly for requirements
(such as IP address of the Syslog, port etc.) 
4.2. Message Body: IISHttp${msg}

When I check Flow Files Attribute from Data Provenance in the Extract Text
Processor, I see 3 attributes same each other. 
Msg: 2020-06-24 13:33:49  GET /Test/Service/test.css
 200 0 0 852 7005 921
Msg.1: 2020-06-24 13:33:49  GET /Test/Service/test.css
 200 0 0 852 7005 921
Msg.2: 2020-06-24 13:33:49  GET /Test/Service/test.css
 200 0 0 852 7005 921

How can I remove duplicate attributes from extract text output? Or I need to
use another way?
Do you have any comment or suggestion?

My environment details are below:
Apache NiFi 1.11.3
Windows Server 2016
Java JRE 1.8.0_241 (64 Bit)



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: NiFi 1.11 - High repository storage usage

2020-06-25 Thread Valentina Ivanova
Hi Wesley & Harald,

Thanks for the quick replies!

@Wesley C. Dias de Oliveira I have recently 
increased these settings due to other reasons, so that might be the primary 
cause.

@Harald You are right, they are on the same drive. I assumed that usage is 
relative to the nifi usage/settings not the OS. I will now change so each 
repository gets its own partition.

Thanks!

Valentina

From: Dobbernack, Harald (Key-Work) 
Sent: Thursday, 25 June 2020 15:08
To: users@nifi.apache.org 
Subject: AW: NiFi 1.11 - High repository storage usage


  *   When I check the size of the respective folders on the disk they don't 
even add up to 12,24 GB shown on the screenshot.



I’m guessing you have other stuff on the drive or partition as well and not 
only the repositories.



Von: Dobbernack, Harald (Key-Work)
Gesendet: Donnerstag, 25. Juni 2020 14:58
An: users@nifi.apache.org
Betreff: AW: NiFi 1.11 - High repository storage usage



Hi,



presumably all three respositories are on the same partition or drive? I 
believe the screenshot view you posted shows what the OS reports as the space 
usage of the whole partition/drive on which the repositories are sitting. Best 
practice would be to place each repository on its own partition.



Greetings

Harald



Von: Valentina Ivanova mailto:valentina.ivan...@ri.se>>
Gesendet: Donnerstag, 25. Juni 2020 14:50
An: users@nifi.apache.org
Betreff: NiFi 1.11 - High repository storage usage



Hello!



I see (screenshot attached) quite high (86%) storage usage for all three Flow 
File, Content & Provenance Repositories in the System Diagnostics. When I check 
the size of the respective folders on the disk they don't even add up to 12,24 
GB shown on the screenshot.

I also have the following settings for the content repository in 
nifi.properties:

nifi.content.repository.archive.max.retention.period=12 hours

nifi.content.repository.archive.max.usage.percentage=50%

nifi.content.repository.archive.enabled=true

nifi.content.repository.always.sync=false





So I am wondering where these numbers are coming from and how to reduce the 
storage utilization (I assume such high utilization will impact the 
performance)? Why is the content repository usage 86% even though the 
max.usage.percentage is set at 50%.



Many thanks in advance & all the best



Valentina









Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
www.key-work.de | 
Datenschutz
Fon: +49-721-78203-264 | E-Mail: harald.dobbern...@key-work.de | Fax: 
+49-721-78203-10

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführer: Andreas Stappert, Tobin Wotring


AW: NiFi 1.11 - High repository storage usage

2020-06-25 Thread Dobbernack, Harald (Key-Work)
  *   When I check the size of the respective folders on the disk they don't 
even add up to 12,24 GB shown on the screenshot.

I'm guessing you have other stuff on the drive or partition as well and not 
only the repositories.

Von: Dobbernack, Harald (Key-Work)
Gesendet: Donnerstag, 25. Juni 2020 14:58
An: users@nifi.apache.org
Betreff: AW: NiFi 1.11 - High repository storage usage

Hi,

presumably all three respositories are on the same partition or drive? I 
believe the screenshot view you posted shows what the OS reports as the space 
usage of the whole partition/drive on which the repositories are sitting. Best 
practice would be to place each repository on its own partition.

Greetings
Harald

Von: Valentina Ivanova mailto:valentina.ivan...@ri.se>>
Gesendet: Donnerstag, 25. Juni 2020 14:50
An: users@nifi.apache.org
Betreff: NiFi 1.11 - High repository storage usage

Hello!

I see (screenshot attached) quite high (86%) storage usage for all three Flow 
File, Content & Provenance Repositories in the System Diagnostics. When I check 
the size of the respective folders on the disk they don't even add up to 12,24 
GB shown on the screenshot.
I also have the following settings for the content repository in 
nifi.properties:
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false


So I am wondering where these numbers are coming from and how to reduce the 
storage utilization (I assume such high utilization will impact the 
performance)? Why is the content repository usage 86% even though the 
max.usage.percentage is set at 50%.

Many thanks in advance & all the best

Valentina





Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
www.key-work.de | 
Datenschutz
Fon: +49-721-78203-264 | E-Mail: harald.dobbern...@key-work.de | Fax: 
+49-721-78203-10

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Gesch?ftsf?hrer: Andreas Stappert, Tobin Wotring


AW: NiFi 1.11 - High repository storage usage

2020-06-25 Thread Dobbernack, Harald (Key-Work)
Hi,

presumably all three respositories are on the same partition or drive? I 
believe the screenshot view you posted shows what the OS reports as the space 
usage of the whole partition/drive on which the repositories are sitting. Best 
practice would be to place each repository on its own partition.

Greetings
Harald

Von: Valentina Ivanova 
Gesendet: Donnerstag, 25. Juni 2020 14:50
An: users@nifi.apache.org
Betreff: NiFi 1.11 - High repository storage usage

Hello!

I see (screenshot attached) quite high (86%) storage usage for all three Flow 
File, Content & Provenance Repositories in the System Diagnostics. When I check 
the size of the respective folders on the disk they don't even add up to 12,24 
GB shown on the screenshot.
I also have the following settings for the content repository in 
nifi.properties:
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false


So I am wondering where these numbers are coming from and how to reduce the 
storage utilization (I assume such high utilization will impact the 
performance)? Why is the content repository usage 86% even though the 
max.usage.percentage is set at 50%.

Many thanks in advance & all the best

Valentina





Harald Dobbernack

Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
www.key-work.de | 
Datenschutz
Fon: +49-721-78203-264 | E-Mail: harald.dobbern...@key-work.de | Fax: 
+49-721-78203-10

Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Gesch?ftsf?hrer: Andreas Stappert, Tobin Wotring


Re: NiFi 1.11 - High repository storage usage

2020-06-25 Thread Wesley C. Dias de Oliveira
Hi, Valentina.

I've experienced the same issue on an old installation.

In my case, it was related to memory usage.

The system begins to dump things to the disk when there's no available
memory.

Have you checked the memory( params?

# JVM memory settings
java.arg.2=-Xms512m
java.arg.3=-Xmx512m






Em qui., 25 de jun. de 2020 às 09:50, Valentina Ivanova <
valentina.ivan...@ri.se> escreveu:

> Hello!
>
> I see (screenshot attached) quite high (86%) storage usage for all three
> Flow File, Content & Provenance Repositories in the System Diagnostics.
> When I check the size of the respective folders on the disk they don't even
> add up to 12,24 GB shown on the screenshot.
> I also have the following settings for the content repository in
> nifi.properties:
> nifi.content.repository.archive.max.retention.period=12 hours
> nifi.content.repository.archive.max.usage.percentage=50%
> nifi.content.repository.archive.enabled=true
> nifi.content.repository.always.sync=false
>
>
> So I am wondering where these numbers are coming from and how to reduce
> the storage utilization (I assume such high utilization will impact the
> performance)? Why is the content repository usage 86% even though the
> max.usage.percentage is set at 50%.
>
> Many thanks in advance & all the best
>
> Valentina
>
>
>
>

-- 
Grato,
Wesley C. Dias de Oliveira.

Linux User nº 576838.


NiFi 1.11 - High repository storage usage

2020-06-25 Thread Valentina Ivanova
Hello!

I see (screenshot attached) quite high (86%) storage usage for all three Flow 
File, Content & Provenance Repositories in the System Diagnostics. When I check 
the size of the respective folders on the disk they don't even add up to 12,24 
GB shown on the screenshot.
I also have the following settings for the content repository in 
nifi.properties:
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false


So I am wondering where these numbers are coming from and how to reduce the 
storage utilization (I assume such high utilization will impact the 
performance)? Why is the content repository usage 86% even though the 
max.usage.percentage is set at 50%.

Many thanks in advance & all the best

Valentina





SV: Replacing a base64-encoded field in a JSON-document with its decoded/converted value

2020-06-25 Thread Myklebust , Bjørn Magnar
Thanks Andy.
The XML-content is around 5 kB-ish.  But I also need to convert the XML to JSON 
before replacing it back into the original JSON-file.  Can this be done with 
e.g a ConvertAttribute before the ReplaceText?

Thanks,
Bjørn



Fra: Andy LoPresto 
Sendt: onsdag 24. juni 2020 17:24
Til: users@nifi.apache.org
Emne: Re: Replacing a base64-encoded field in a JSON-document with its 
decoded/converted value

Hello Bjørn,

If the size of the encoded XML document is small (under ~1 KB), you can extract 
the Base64-encoded value to a flowfile attribute using EvaluateJSONPath, 
perform the decoding using the base64Decode Expression Language function [1], 
and then replace it into the flowfile JSON content using ReplaceText (using 
some regex like "content": ".*" -> “content": ”${decodedXML}” where decodedXML 
is the name of the attribute you are using).

If the XML content could be very large, this will negatively affect your 
performance, as attributes are stored directly in memory and handling large 
amounts of data will impact the heap. In this case, I would recommend writing a 
Groovy script in ExecuteScript processor to leverage Groovy’s very friendly 
JSON handling and extract the value, Base64 decode it, and replace it in a 
couple lines.

Hope this helps.


[1] 
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#base64decode

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69


On Jun 24, 2020, at 4:24 AM, Myklebust, Bjørn Magnar 
mailto:bjorn.mykleb...@skatteetaten.no>> wrote:


Hi.
I have a set of Json-files which contain a base64-coded field (Jsonpath to this 
field is $.data.content), and this field contains a XML-document.  Decoding the 
field works as expected, so does the conversion from xml to json,  and I'm able 
to write the content from this field to a file in a bucket in S3.  But what I 
would like to do is to be able to replace the coded value for this field in the 
original file with the decoded/converted value in stead of writing the 
decoded/converted value to file. And after replacing the json-value then I can 
write the updated Json-file to a new S3 bucket.
My process look like this at the moment, and works fine for getting the data to 
file, but it's missing the last part of replacing $.data.content with the 
decoded/converted data.

So how can I do the last part?



The EvaluedJsonPath looks like this:



The ReplaceText looks like this:


The Base64EncodeContent looks like this:


and finally, the CovertRecord looks like this:




This is a testfile for that I'm working with:

{
  "header": {
"dokumentidentifikator": null,
"dokumentidentifikatorV2": "dcff985b-c652-4085-b8f1-45a2f4b6d150",
"revisjonsnummer": 1,
"dokumentnavn": 
"Engangsavgiftfastsettelse:55TEST661122334455:44BIL1:2017-10-20",
"dokumenttype": "SKATTEMELDING_ENGANGSAVGIFT",
"dokumenttilstand": "OPPRETTET",
"gyldig": true,
"gjelderInntektsaar": 2017,
"gjelderPeriode": "2017_10",
"gjelderPart": {
  "partsnummer": 5544332211,
  "identifiseringstype": "MASKINELL",
  "identifikator": null
},
"opphavspart": {
  "partsnummer": 5544332211,
  "identifikator": null
},
"kildereferanse": {
  "kildesystem": "ENGANGSAVGIFTFASTSETTELSE",
  "gruppe": "",
  "referanse": "aef147fb-8ce8-43ef-833b-7aa3bac1ece0",
  "tidspunkt": "2018-01-16T13:28:02.49+01:00"
}
  },
  "data": {
"metadata": {
  "format": "ske:fastsetting:motorvogn:motorvognavgift:v1",
  "bytes": 4420,
  "mimeType": "application/xml",
  "sha1": "c0AowOsTdNdo6VufeSsZqTphc0Y="
},
"content": 

Re: Indications in the UI of which cluster node hosts a “stuck” thread?

2020-06-25 Thread James McMahon
This does help, thank you Matt. And I like your suggestion. It would be
more at our fingertips if as we hover over the thread count on the
processor, the distribution across all cluster nodes is presented in a
popup. I wonder if project leads would consider this helpful improvement?

I can now see that my hanging threads are on just two of my cluster nodes.
This is very helpful - thanks again. It reduces the amount of thread
dumping review I will be doing today.

Jim

On Wed, Jun 24, 2020 at 9:53 PM Matt Gilman  wrote:

> Hi Jim,
>
> If you open the Summary page from the global menu you should see the
> active threads in parentheses next to the scheduled state. Find the row in
> question and click the cluster icon from the actions column. This will open
> a dialog with a node-wise breakdown. I believe that the thread count is one
> of the metrics that is broken down per node.
>
> Hope this helps! Adding this breakdown to the main canvas would be a great
> addition. Maybe these breakdowns could be offered in a tooltip first each
> metric.
>
> Matt
>
> Sent from my iPhone
>
> > On Jun 24, 2020, at 21:05, James McMahon  wrote:
> >
> > 
> > Our production nifi cluster is exhibiting repeated problems with threads
> that do not end. It is happening with processors that have complex
> configurations and dependencies (ConsumeAMQP), and - more troubling - it is
> also occurring periodically for simple processors like ControlRate. I’ll
> have a Control processor sitting in a running state with no active running
> thread,I select Stop on that processor, get a thread I presume to be
> responsible for stopping the processor, and that thread will never end.
> This renders my processor in a useless state - not stopped, not really
> running, and not accessible to reconfigure.
> >
> > I read a blog by Pierre Villard on using nifi.sh for thread dumps. I’ll
> dig into that. My questions:
> >
> > 1. In a cluster, is there anything I can use in the UI to tell me which
> cluster node hosts the bad thread? Digging through thread dumps from
> multiple cluster nodes seems impractical, and I’m hoping there’s a way to
> zero in on a node.
> >
> > 2. What nifi system resources in my configuration influence the
> management and well-being of these threads?
> >
> > 3. Has anyone debugged such a thread issue in a clustered nifi
> environment, and if so can you offer any tips based on your experience?
> >
> > Thanks in advance for any help.
> > Jim
>