Containerizing, clustering, Registry - how to begin?

2024-07-06 Thread James McMahon
I would like to build a containerized, scalable, highly available NiFi
architecture - likely docker, likely on EC2s. I intend to have dev, int,
and prod containerized groups, version controlled through NiFi Registry. I
am trying to understand how NiFi clusters, Registry, and containerization
get used in such a complex architecture.



Does one first containerize each and every NiFi node in the cluster
independently, each and every zookeeper node independently, and then group
them together in NiFi clusters as described in the Apache NiFi Admin Guide,
here:
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#clustering?
There doesn’t seem to be much in there about clustering and containerizing.


Am I thinking about this wrong? Does one abandon traditional nifi
clustering and instead achieve scalability somehow through
dockerized containers running single NiFi node instances, employing a load
balancer in front of the containers to distribute load - so no traditional
NiFi cluster, no zookeeper cluster, no coordination between NiFi nodes?



How well does NiFi Registry function when the underlying NiFi clusters are
containerized?


Has anyone accomplished anything similar? Could you help me understand how
to build out such a complex architecture: NiFi nodes in containers,
zookeeper nodes in containers, all clustered, NiFi Registry lording over
all for version control and process group promotion?

This may be helpful to get started:

https://sandundayananda.medium.com/deploy-apache-nifi-on-docker-with-aws-ec2-instance-and-connect-to-web-interface-3e516e06fe04

But it doesn't speak to clustering or to NiFi Registry in a containerized,
clustered architecture.

Thanks in advance for any thoughts.


ConsumeAMQP not delivering message from RMQ to Success

2024-07-05 Thread James McMahon
I have a queue in RabbitMQ that I try to ConsumeAMQP from, configured like
this:

Queue
detection-responses
Auto-Acknowledge Messages
false
Batch Size
1
Prefetch Count
0
Header Output Format
Comma-Separated String
Header Separator
,
Remove Curly Braces
False
BrokersNo value setHost Name
18.235.119.166
Port
5672
Virtual Host
/
User Name
Admin
Password Sensitive value
setAMQP Version
0.9.1
SSL Context Service
No value set
Use Client Certificate Authentication
false

My messages are not successfully consumed into my nifi flow. This is what I
find in my log:

17:23:39 UTC
INFO
1134134d-1ec5-158c-c3db-526be43cd3fa

ConsumeAMQP[id=1134134d-1ec5-158c-c3db-526be43cd3fa] Successfully
connected AMQPConsumer to amqp://Admin@18.235.119.166:5672/ and
'detection-responses' queue

17:23:39 UTC
DEBUG
1134134d-1ec5-158c-c3db-526be43cd3fa

ConsumeAMQP[id=1134134d-1ec5-158c-c3db-526be43cd3fa] Closing AMQP
channel for amqp://Admin@18.235.119.166:5672/

17:23:39 UTC
INFO
1134134d-1ec5-158c-c3db-526be43cd3fa

ConsumeAMQP[id=1134134d-1ec5-158c-c3db-526be43cd3fa] Consumer is
closed, discarding message (delivery tag: 1).

Auto-refresh



Is this because the channel is closing prematurely? Is there a queue
property I must set to keep it open?

Thank you for any help.


Re: Accessing File Size in ExecuteGroovyScript script

2024-06-17 Thread James McMahon
It appears the best way to do this is to apply the java getSize() method to
the ff object, like this:
def ffSize = ff.getSize()

This seems to output the file size to the log in bytes.

def file_size = (ffSize != null) ? ffSize :0

if (file_size == 0) {
log.error("file_size is undefined or zero, which prevents division.")
session.transfer(ff, REL_FAILURE)
return
}

log.info("File size: ${file_size}")



On Mon, Jun 17, 2024 at 8:31 AM James McMahon  wrote:

> I should mention that I have also tried to access fileSize like this
> without success:
>
> //def file_size = (ff.size != null) ? (ff.size as Integer) : 0
> def file_size = ff.getAttribute('fileSize')
>
> This returns null.
>
> I tried this approach based on a post in cloudera, which said this:
>
> |The default FlowFile attributes include:
>
> entryDate
>
> lineageStartDate
>
> fileSize
>
> filename
>
> path
>
> uuid
>
>
> On Mon, Jun 17, 2024 at 7:50 AM James McMahon 
> wrote:
>
>> I am trying to use the file size. On the DEtails tab for my flowfile in
>> queue, I see that my File Size is 8.01 GB.
>>
>> I log the following from this section of a Groovy script, running in an
>> ExecuteGroovyScript processor:
>>
>> def ff = session.get()
>> if (!ff) return
>>
>> def jsonFactory = new JsonFactory()
>> log.info("JsonFactory created successfully.")
>>
>> def numberOfObjects = 0
>> def lineCounter = 0  // Initialize a counter to track the number of lines
>>
>> // Ensure file_size is not null and cast to Integer
>> def file_size = (ff.size != null) ? (ff.size as Integer) : 0
>>
>> if (file_size == 0) {
>> log.error("file_size is undefined or zero, which prevents division.")
>> session.transfer(ff, REL_FAILURE)
>> return
>> }
>>
>> log.info("File size: ${file_size}")
>>
>>
>> I want file_size to always be the number of bytes in the file so that I
>> am always working with a consistent representation.
>>
>> But in my log, the result is this:
>> ExecuteGroovyScript[id=1110134d-1ea1-1565-f962-eee47a3fc654] File size:
>> 15408597
>>
>> That isn't 8.01 GB. Where am I making my error?
>>
>>


Re: Accessing File Size in ExecuteGroovyScript script

2024-06-17 Thread James McMahon
I should mention that I have also tried to access fileSize like this
without success:

//def file_size = (ff.size != null) ? (ff.size as Integer) : 0
def file_size = ff.getAttribute('fileSize')

This returns null.

I tried this approach based on a post in cloudera, which said this:

|The default FlowFile attributes include:

entryDate

lineageStartDate

fileSize

filename

path

uuid


On Mon, Jun 17, 2024 at 7:50 AM James McMahon  wrote:

> I am trying to use the file size. On the DEtails tab for my flowfile in
> queue, I see that my File Size is 8.01 GB.
>
> I log the following from this section of a Groovy script, running in an
> ExecuteGroovyScript processor:
>
> def ff = session.get()
> if (!ff) return
>
> def jsonFactory = new JsonFactory()
> log.info("JsonFactory created successfully.")
>
> def numberOfObjects = 0
> def lineCounter = 0  // Initialize a counter to track the number of lines
>
> // Ensure file_size is not null and cast to Integer
> def file_size = (ff.size != null) ? (ff.size as Integer) : 0
>
> if (file_size == 0) {
> log.error("file_size is undefined or zero, which prevents division.")
> session.transfer(ff, REL_FAILURE)
> return
> }
>
> log.info("File size: ${file_size}")
>
>
> I want file_size to always be the number of bytes in the file so that I am
> always working with a consistent representation.
>
> But in my log, the result is this:
> ExecuteGroovyScript[id=1110134d-1ea1-1565-f962-eee47a3fc654] File size:
> 15408597
>
> That isn't 8.01 GB. Where am I making my error?
>
>


Accessing File Size in ExecuteGroovyScript script

2024-06-17 Thread James McMahon
I am trying to use the file size. On the DEtails tab for my flowfile in
queue, I see that my File Size is 8.01 GB.

I log the following from this section of a Groovy script, running in an
ExecuteGroovyScript processor:

def ff = session.get()
if (!ff) return

def jsonFactory = new JsonFactory()
log.info("JsonFactory created successfully.")

def numberOfObjects = 0
def lineCounter = 0  // Initialize a counter to track the number of lines

// Ensure file_size is not null and cast to Integer
def file_size = (ff.size != null) ? (ff.size as Integer) : 0

if (file_size == 0) {
log.error("file_size is undefined or zero, which prevents division.")
session.transfer(ff, REL_FAILURE)
return
}

log.info("File size: ${file_size}")


I want file_size to always be the number of bytes in the file so that I am
always working with a consistent representation.

But in my log, the result is this:
ExecuteGroovyScript[id=1110134d-1ea1-1565-f962-eee47a3fc654] File size:
15408597

That isn't 8.01 GB. Where am I making my error?


Re: FlattenJSON fails on large json file

2024-06-14 Thread James McMahon
Thanks Eric. So then this in the error message - java.lang.OutOfMemoryError
- isn't really to be taken at face value. FlattenJson tried to index an
array that exceeded the maximum value of an integer, and it choked.

An 8 GB file really isn't that large. I'm hoping someone has encountered
this before and will weigh in with a reply.

On Fri, Jun 14, 2024 at 2:08 PM Eric Secules  wrote:

> Hi James,
>
> I don't have a solution for you off the top of my head. But I can tell you
> the failure is because you've got an array longer than the maximum value of
> an Int. So, memory is not the limiting factor.
>
> -Eric
>
> On Fri, Jun 14, 2024, 10:59 AM James McMahon  wrote:
>
>> I have a json file, incoming.json. It is 9 GB in size.
>>
>> I want to flatten the json so that I can tabulate the number of times
>> each key appears. Am using a FlattenJson 2.0.0-M2 processor, with
>> this configuration:
>>
>> Separator   .
>> Flatten Mode  normal
>> Ignore Reserved Characters  false
>> Return Typeflatten
>> Character Set  UTF-8
>> Pretty Print JSON   true
>>
>> This processor has worked so far on json files as large as 2 GB. But this
>> 9 GB one is causing this issue:
>>
>> FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing halted: 
>> yielding [1 sec]: java.lang.OutOfMemoryError: Required array length 
>> 2147483639 + 9 is too large
>>
>>
>> htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi heap 
>> shows it has 88GB of that dedicated for its use.
>>
>>
>> How can I handle large json files in this processor? It would seem that 
>> breaking the file up is not an option because it will violate the integrity 
>> of the json structure most likely.
>>
>>
>> What options do I have?
>>
>>


FlattenJSON fails on large json file

2024-06-14 Thread James McMahon
I have a json file, incoming.json. It is 9 GB in size.

I want to flatten the json so that I can tabulate the number of times each
key appears. Am using a FlattenJson 2.0.0-M2 processor, with
this configuration:

Separator   .
Flatten Mode  normal
Ignore Reserved Characters  false
Return Typeflatten
Character Set  UTF-8
Pretty Print JSON   true

This processor has worked so far on json files as large as 2 GB. But this 9
GB one is causing this issue:

FlattenJson[id=ea2650e2-8974-1ff7-2da9-a0f2cd303258] Processing
halted: yielding [1 sec]: java.lang.OutOfMemoryError: Required array
length 2147483639 + 9 is too large


htop confirms I have 92 GB or memory on my EC2 instance, and the NiFi
heap shows it has 88GB of that dedicated for its use.


How can I handle large json files in this processor? It would seem
that breaking the file up is not an option because it will violate the
integrity of the json structure most likely.


What options do I have?


Re: Insufficient permissions on initial start up (NiFi 2.0)

2024-04-25 Thread James McMahon
When I learned that the initial user in authorizers.xml must match the
certificate *exactly*, I figured it would be an easy matter to use openssl
to inspect the cert. I wanted to be *absolutely* certain I matched it
correctly.
Here is the command:
/opt/nifi/config_resources/keys$ openssl pkcs12 -info -in CN=admin2.p12
-nodes

Here is the output. The output informed me of what I thought was the
subject info in the cert:
..
subject=C = US, ST = Virginia, L = Reston, O = C4 Rampart, OU = NIFI, CN =
admin2
issuer=C = US, ST = Virginia, L = Reston, O = C4 Rampart, OU = Secure
Digital Certificate Signing, CN = C4 Rampart CA
-BEGIN CERTIFICATE-
MIIExDCCA6ygAwIBAgICeYowDQYJKoZI..

This is why I put that in the way that I did. I mean why on earth would it
reverse the DN info and add extra spaces, right!? It's like having a
separate knob for volume on the alarm clock in Seinfeld. Why separate knob,
WHY?

Anyway I reversed my entry and squashed the extra spacing. I updated
authorizers.xml. I blew away authorizations.xml and users.xml so that nifi
would recreate them at startup. I also fixed
nifi.security.identity.mapping.pattern.dn and
nifi.security.identity.mapping.value.dn in nifi.properties.I restarted
nifi. And as Bryan, Matt, and Isha certainly already suspect, it worked.

I still have one more thing to figure out. I've got the CA and user cert
info in my browser. I've got a server cert. All have been generated by
tinycert.org. Yet at the https URL in the browser, it still tells me it is
insecure. I do not understand why.
x Not secure  https://
ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi/

Anyway, thank you Bryan, Matt, and Isha for replying.

On Wed, Apr 24, 2024 at 9:05 PM Matt Gilman  wrote:

> What is this Access Token it cites at top?
>>
>
> NiFi UI attempts to get the access token expiration. However, since you're
> authenticating with a certificate the endpoint returns an IllegalState
> because there was no token in the request.
>
> Looking at the logs and the supplied configuration it appears there are
> spaces and the ordering is reversed in your initial admin identity when
> compared with the value from the certificate.
>
> C = US, ST = Virginia, L = Reston, O = C4 Rampart, OU = NIFI, CN = admin2
> CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US
>
> Hope this helps!
>
> Matt
>
> On Wed, Apr 24, 2024 at 8:41 PM James McMahon 
> wrote:
>
>> Looking at the nifi-user.log, I find I am getting a Conflict response,
>> Access Token not found.
>>
>>  more ./nifi-user.log
>> 2024-04-25 00:23:49,329 INFO [main] o.a.n.a.FileUserGroupProvider
>> Creating new users file at /opt/nifi/config_resources/users.xml
>> 2024-04-25 00:23:49,352 INFO [main] o.a.n.a.FileAccessPolicyProvider
>> Creating new authorizations file at
>> /opt/nifi/config_resources/authorizations.xml
>> 2024-04-25 00:23:49,573 INFO [main] o.a.n.a.FileAccessPolicyProvider
>> Populating authorizations for Initial Admin: C = US, ST = Virginia, L =
>> Reston, O = C4 Rampart, OU = NIFI,
>> CN = admin2
>> 2024-04-25 00:24:51,107 INFO [NiFi Web Server-100]
>> o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 173.73.40.110
>> [CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virgi
>> nia, C=US] POST
>> https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/kerberos
>> 2024-04-25 00:24:51,110 INFO [NiFi Web Server-100]
>> o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=admin2,
>> OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US] 173
>> .73.40.110 POST
>> https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/kerberos
>> 2024-04-25 00:24:51,166 INFO [NiFi Web Server-104]
>> o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 173.73.40.110
>> [CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virgi
>> nia, C=US] GET
>> https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/token/expiration
>> 2024-04-25 00:24:51,166 INFO [NiFi Web Server-104]
>> o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=admin2,
>> OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US] 173
>> .73.40.110 GET
>> https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/token/expiration
>> 2024-04-25 00:24:51,172 WARN [NiFi Web Server-104]
>> o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException:
>> *Access Token not found. Returning Conflict
>> response.java.lang.IllegalStateException: Access Token not found*
>> at
>> org.apache.nifi.web.api.AccessResource.getAccessTokenExpiration(AccessResource.java:463)
>> at
>> java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

Re: Insufficient permissions on initial start up (NiFi 2.0)

2024-04-24 Thread James McMahon
Looking at the nifi-user.log, I find I am getting a Conflict response,
Access Token not found.

 more ./nifi-user.log
2024-04-25 00:23:49,329 INFO [main] o.a.n.a.FileUserGroupProvider Creating
new users file at /opt/nifi/config_resources/users.xml
2024-04-25 00:23:49,352 INFO [main] o.a.n.a.FileAccessPolicyProvider
Creating new authorizations file at
/opt/nifi/config_resources/authorizations.xml
2024-04-25 00:23:49,573 INFO [main] o.a.n.a.FileAccessPolicyProvider
Populating authorizations for Initial Admin: C = US, ST = Virginia, L =
Reston, O = C4 Rampart, OU = NIFI,
CN = admin2
2024-04-25 00:24:51,107 INFO [NiFi Web Server-100]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 173.73.40.110
[CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virgi
nia, C=US] POST
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/kerberos
2024-04-25 00:24:51,110 INFO [NiFi Web Server-100]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=admin2,
OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US] 173
.73.40.110 POST
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/kerberos
2024-04-25 00:24:51,166 INFO [NiFi Web Server-104]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 173.73.40.110
[CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virgi
nia, C=US] GET
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/token/expiration
2024-04-25 00:24:51,166 INFO [NiFi Web Server-104]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=admin2,
OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US] 173
.73.40.110 GET
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/access/token/expiration
2024-04-25 00:24:51,172 WARN [NiFi Web Server-104]
o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException:
*Access Token not found. Returning Conflict
response.java.lang.IllegalStateException: Access Token not found*
at
org.apache.nifi.web.api.AccessResource.getAccessTokenExpiration(AccessResource.java:463)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)

followed by

2024-04-25 00:24:51,185 INFO [NiFi Web Server-100]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Started 173.73.40.110
[CN=admin2, OU=NIFI, O=C4 Rampart, L=Reston, ST=Virgi
nia, C=US] GET
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/flow/current-user
2024-04-25 00:24:51,186 INFO [NiFi Web Server-100]
o.a.n.w.s.NiFiAuthenticationFilter Authentication Success [CN=admin2,
OU=NIFI, O=C4 Rampart, L=Reston, ST=Virginia, C=US] 173
.73.40.110 GET
https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi-api/flow/current-user
2024-04-25 00:24:51,192 INFO [NiFi Web Server-100]
o.a.n.w.a.c.AccessDeniedExceptionMapper identity[CN=admin2, OU=NIFI, O=C4
Rampart, L=Reston, ST=Virginia, C=US], groups[]
*does not have permission to access the requested resource. Unable to view
the user interface. Returning Forbidden response.*

What is this Access Token it cites at top?

Based on what I have read in the documentation, nifi itself must be allowed
to create the authorizations.xml file at initial startup. Is there a
reason it would omit permission to view and use the UI?

On Wed, Apr 24, 2024 at 5:18 PM Matt Gilman  wrote:

> James,
>
> If you check the nifi-user.log in the logs directory, you should see
> messages for the requests that are being rejected. In that log message you
> should see the identity that you're authenticated with. Can you compare
> that with the user that you've configured the policies for. Hopefully, that
> will help point to where the issue is.
>
> Matt
>
> On Wed, Apr 24, 2024 at 5:03 PM James McMahon 
> wrote:
>
>> I still cannot access my own NiFi 2.0 instance. I continue to get this
>> rejection:
>>
>> Insufficient Permissions
>>
>>- home
>>
>> Unable to view the user interface. Contact the system administrator.
>>
>>
>> The canvas flashes for an instant when I try to hit my secure URL, but is
>> immediately replaced with this rejection message.
>>
>> There is no error or warning in nifi-app.log
>>
>> Has anyone experienced a similar problem?
>>
>>
>> Here is my authorizers.xml:
>>
>> 
>> 
>> file-user-group-provider
>> org.apache.nifi.authorization.FileUserGroupProvider
>> /opt/nifi/config_resources/users.xml
>> 
>> C = US, ST = Virginia, L
>> = Reston, O = C4 Rampart, OU = NIFI, CN = admin2
>> 
>> 
>> file-access-policy-provider
>>
>> org.apache.nifi.authorization.FileAccessPolicyProvider
>> file-user-group-provider
>> /opt/nifi/config_resources/authorizations.xml

Re: Insufficient permissions on initial start up (NiFi 2.0)

2024-04-24 Thread James McMahon
I still cannot access my own NiFi 2.0 instance. I continue to get this
rejection:

Insufficient Permissions

   - home

Unable to view the user interface. Contact the system administrator.


The canvas flashes for an instant when I try to hit my secure URL, but is
immediately replaced with this rejection message.

There is no error or warning in nifi-app.log

Has anyone experienced a similar problem?


Here is my authorizers.xml:



file-user-group-provider
org.apache.nifi.authorization.FileUserGroupProvider
/opt/nifi/config_resources/users.xml

C = US, ST = Virginia, L =
Reston, O = C4 Rampart, OU = NIFI, CN = admin2


file-access-policy-provider

org.apache.nifi.authorization.FileAccessPolicyProvider
file-user-group-provider
/opt/nifi/config_resources/authorizations.xml
C = US, ST = Virginia, L =
Reston, O = C4 Rampart, OU = NIFI, CN = admin2




managed-authorizer

org.apache.nifi.authorization.StandardManagedAuthorizer
file-access-policy-provider




Here is my authorizations.xml (nifi creates at first startup):













































Here is my users.xml (nifi creates at first startup):










On Wed, Apr 24, 2024 at 8:21 AM James McMahon  wrote:

> I'll review this closely once again when I get back to this system tonight
> - thanks very much for your reply, Isha.
>
> I also feel I need to look more closely in nifi.properties, at values I
> have set for keys nifi.security.identity.mapping.[value, transform,
> pattern].CN1
>
> I noticed some odd behavior and suspect it is a reflection of an issue I
> have not set properly in my configuration:
> The first time I started my 2.0 instance with my Initial Admin Identity
> defined as shown, the UI in my browser actually presented me with a list
> (of one) Personal cert to select from - the cert for admin2. I was in a
> happy place: *finally*, nifi and the browser appeared to be in synch for
> the Subject name in the cert.
>
> I selected this cert, but then was crushed by the rejection mentioned
> above:
>  Unable to view the user interface. Contact the system administrator.
>  Insufficient Permissions home
>
> I restarted nifi so I could "tail -f" nifi-app.log.
> After restart, I once again tried to hit my NiFi URL.
> This time though, the browser failed to present the admin2 cert for
> selection.  Shouldn't it have still presented that to me in the browser fro
> my selection?
> Do you have any thoughts why this behavior is occurring?
>
> Would you say it is it advisable to manually create by hand an
> authorizations.xml file should I continue to experience Insufficient
> Permissions problems? I recall reading that users.xml and
> authorizations.xml - if absent at initial startup - should be created by
> nifi from info in authorizers.xml. But this Insufficient Permissions makes
> me suspect something is missing from authorizations.
>
> Jim
>
> On Wed, Apr 24, 2024 at 5:33 AM Isha Lamboo <
> isha.lam...@virtualsciences.nl> wrote:
>
>> Hi James,
>>
>>
>>
>> Have you changed these settings in authorizers.xml since you first
>> started NiFi? If so, you may need to delete users.xml and
>> authorizations.xml.
>>
>> A new admin user will not be created if those files already exist.
>>
>>
>>
>> Otherwise, the trickiest part is usually that the user DN needs to match *
>> *exactly** with that specified. Capitals and whitespace matter. Since
>> you are getting insufficient permissions instead of unknown user, I don’t
>> think that’s your problem here. Still, it may be worth checking for a
>> mismatch in the initial admin identity vs initial user identity vs
>> certificate.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Isha
>>
>>
>>
>> *Van:* James McMahon 
>> *Verzonden:* woensdag 24 april 2024 02:14
>> *Aan:* users 
>> *Onderwerp:* Insufficient permissions on initial start up (NiFi 2.0)
>>
>>
>>
>> I am trying to start my new NiFi 2.0 installation. I have a user admin2
>> that has a cert. The nifi server also has a cert. Both are signed by the
>> same CA.
>>
>>
>>
>> At start up in my browser I am denied due to insufficient privileges:
>>
>>
>>
>> Unab

Re: Insufficient permissions on initial start up (NiFi 2.0)

2024-04-24 Thread James McMahon
I'll review this closely once again when I get back to this system tonight
- thanks very much for your reply, Isha.

I also feel I need to look more closely in nifi.properties, at values I
have set for keys nifi.security.identity.mapping.[value, transform,
pattern].CN1

I noticed some odd behavior and suspect it is a reflection of an issue I
have not set properly in my configuration:
The first time I started my 2.0 instance with my Initial Admin Identity
defined as shown, the UI in my browser actually presented me with a list
(of one) Personal cert to select from - the cert for admin2. I was in a
happy place: *finally*, nifi and the browser appeared to be in synch for
the Subject name in the cert.

I selected this cert, but then was crushed by the rejection mentioned above:
 Unable to view the user interface. Contact the system administrator.
 Insufficient Permissions home

I restarted nifi so I could "tail -f" nifi-app.log.
After restart, I once again tried to hit my NiFi URL.
This time though, the browser failed to present the admin2 cert for
selection.  Shouldn't it have still presented that to me in the browser fro
my selection?
Do you have any thoughts why this behavior is occurring?

Would you say it is it advisable to manually create by hand an
authorizations.xml file should I continue to experience Insufficient
Permissions problems? I recall reading that users.xml and
authorizations.xml - if absent at initial startup - should be created by
nifi from info in authorizers.xml. But this Insufficient Permissions makes
me suspect something is missing from authorizations.

Jim

On Wed, Apr 24, 2024 at 5:33 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> Have you changed these settings in authorizers.xml since you first started
> NiFi? If so, you may need to delete users.xml and authorizations.xml.
>
> A new admin user will not be created if those files already exist.
>
>
>
> Otherwise, the trickiest part is usually that the user DN needs to match *
> *exactly** with that specified. Capitals and whitespace matter. Since you
> are getting insufficient permissions instead of unknown user, I don’t think
> that’s your problem here. Still, it may be worth checking for a mismatch in
> the initial admin identity vs initial user identity vs certificate.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* woensdag 24 april 2024 02:14
> *Aan:* users 
> *Onderwerp:* Insufficient permissions on initial start up (NiFi 2.0)
>
>
>
> I am trying to start my new NiFi 2.0 installation. I have a user admin2
> that has a cert. The nifi server also has a cert. Both are signed by the
> same CA.
>
>
>
> At start up in my browser I am denied due to insufficient privileges:
>
>
>
> Unable to view the user interface. Contact the system administrator.
>
> Insufficient Permissions home
>
>
>
>
>
> My authorizors.xml has been configured as follows:
>
> 
> 
> file-user-group-provider
> org.apache.nifi.authorization.FileUserGroupProvider
> /opt/nifi/config_resources/users.xml
> 
> C = US, ST = Virginia, L
> = Reston, O = C4 Rampart, OU = NIFI, CN = admin2
> 
> 
> file-access-policy-provider
>
> org.apache.nifi.authorization.FileAccessPolicyProvider
> file-user-group-provider
> /opt/nifi/config_resources/authorizations.xml
> C = US, ST = Virginia, L =
> Reston, O = C4 Rampart, OU = NIFI, CN = admin2
> 
> 
> 
> 
> managed-authorizer
>
> org.apache.nifi.authorization.StandardManagedAuthorizer
> file-access-policy-provider
> 
> 
>
>
>
> I read that at start up, authorizations.xml and users.xml would be created
> by NiFi - those files are not to be hand jammed.
>
>
>
> So how do I actually get in with my admin2 user?
>
> What have I overlooked on this magical mystery tour?
>
>
>
>
>


Insufficient permissions on initial start up (NiFi 2.0)

2024-04-23 Thread James McMahon
I am trying to start my new NiFi 2.0 installation. I have a user admin2
that has a cert. The nifi server also has a cert. Both are signed by the
same CA.

At start up in my browser I am denied due to insufficient privileges:

Unable to view the user interface. Contact the system administrator.
Insufficient Permissions home


My authorizors.xml has been configured as follows:


file-user-group-provider
org.apache.nifi.authorization.FileUserGroupProvider
/opt/nifi/config_resources/users.xml

C = US, ST = Virginia, L =
Reston, O = C4 Rampart, OU = NIFI, CN = admin2


file-access-policy-provider

org.apache.nifi.authorization.FileAccessPolicyProvider
file-user-group-provider
/opt/nifi/config_resources/authorizations.xml
C = US, ST = Virginia, L =
Reston, O = C4 Rampart, OU = NIFI, CN = admin2




managed-authorizer

org.apache.nifi.authorization.StandardManagedAuthorizer
file-access-policy-provider



I read that at start up, authorizations.xml and users.xml would be created
by NiFi - those files are not to be hand jammed.

So how do I actually get in with my admin2 user?
What have I overlooked on this magical mystery tour?


Re: How to upload flow definition to new 2.0 instance?

2024-04-19 Thread James McMahon
Thank you Pierre. I appreciate your reply. I have not yet done this but
intend to, as soon as I can figure out how to get my NiFi 2.0 instance to
come up securely with certs and TLS. I do and did not want to load up my
flows into a nonsecured NiFi instance, which is why I have taken so long to
reply back with any results. The secured instance is not yet working, as
you likely saw in my other post.
Respectfully,
Jim

On Thu, Apr 11, 2024 at 3:05 AM Pierre Villard 
wrote:

> In NiFi 1.x, right click on process group, download flow definition. It'll
> give you a JSON file. In NiFi 2, drag and drop a process group on the
> canvas, and click the upload icon on the right of the name input. You'll be
> able to select your JSON file.
>
> HTH
>
> Le jeu. 11 avr. 2024 à 02:14, James McMahon  a
> écrit :
>
>> I had developed an extensive NiFi Flow in v1.16. I have initialized an
>> instance of NiFi 2.0.
>>
>> I downloaded my flow to file NiFi_Flow.json from my 1.16 instance.
>> But I can find no way to import this or load this to my 2.0 instance.
>> We currently have no Registry.
>> How can I do this?
>>
>


Re: Unable to securely connect to NiFi 2.0 instance

2024-04-19 Thread James McMahon
Thank you Isha. I will try this and see what I find. I'll report back here.
I appreciate your reply.
Cheers,
Jim

On Fri, Apr 19, 2024 at 4:37 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> I would suggest you try to debug this using the openssl s_client command,
> something like this:
>
>
>
> openssl s_client -connect : -debug -cert client.pem -key
> clientkey.pem -CAfile rootcert.pem
>
>
>
> This should give you a lot of details, including information from the
> server that specifies which CAs it will accept for client certs.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
>
>
> *Van:* James McMahon 
> *Verzonden:* vrijdag 19 april 2024 01:17
> *Aan:* users 
> *Onderwerp:* Re: Unable to securely connect to NiFi 2.0 instance
>
>
>
> I started from scratch. Got nifi to start, no errors at all in my
> nifi-app.log. Configured the client certs in my Chrome browser, also added
> cacert.pem to my Root Trusted CAs.
>
> Tried to hit https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi
> , continue to get rejected with this message from the browser:
>
> This site can’t provide a secure
> connectionec2-44-219-227-80.compute-1.amazonaws.com didn’t accept your
> login certificate, or one may not have been provided.
> Try contacting the system admin.
> ERR_BAD_SSL_CLIENT_AUTH_CERT
>
>
>
> I never get prompted to select a client cert.
>
>
>
> Anyone have any thoughts - fixing, debugging, anything?
>
>
>
> On Wed, Apr 17, 2024 at 8:44 PM James McMahon 
> wrote:
>
> I have installed and configured NiFi 2.0 with TLS. My nifi 2.0 instance
> appears to start without errors, judging by the contents of nifi-app.log.
>
>
>
> When I try to access my nifi instance through its https setting in
> nifi.properties, I get this error in my browser:
>
>
>
> This site can’t provide a secure connection
>
> ec2-44-219-227-80.compute-1.amazonaws.com didn’t accept your login
> certificate, or one may not have been provided.
> Try contacting the system admin.
> ERR_BAD_SSL_CLIENT_AUTH_CERT
>
>
>
> Normally I would expect to be prompted to select admin's login cert from
> the list of trusted certs. But I am not getting prompted - it just throws
> the error.
>
>
>
> I had employed tinycert.org to generate my cacert.pem, my server cert and
> private key, and a client cert and private key for my admin user.
>
>
>
> This is how I brought the server private key and cert into my keystore:
>
> openssl pkcs12 -export -out keystore.p12 -inkey ec2-44-219-227-80-key.pem
> -in ec2-44-219-227-80.pem -certfile cacert.pem
>
>
>
> This is how I imported my cacert into the nifi truststore with java
> keytool:
>
> keytool -import -alias "CACert" -file cacert.pem -keystore truststore.jks
> -storepass 
>
>
>
> This is how I converted my client cert and key, which I then added to my
> browser cert store:
>
> openssl pkcs12 -export -out admin.p12 -inkey admin-key.pem -in admin.pem
> -certfile cacert.pem
>
>
>
> I have configured the cacert in my nifi truststore.jks. I have the server
> cert and private key in my keystore.p12. (I had read that jks for one and
> p12 for the other is not an issue).
>
>
>
> I have installed the cert and private key for user admin in my Chrome
> browser. I also installed the cacert.pem CA in my browser trusted root
> store.
>
>
>
> Here are my keystore, truststore, and https params in nifi.properties:
>
> nifi.web.https.host=ec2-44-219-227-80.compute-1.amazonaws.com
> nifi.web.https.port=8443
>
> ...
> nifi.security.keystore=/opt/nifi/config_resources/keys/keystore.p12
> nifi.security.keystoreType=PKCS12
> nifi.security.keystorePasswd=<.>
> nifi.security.keyPasswd=<.>
> nifi.security.truststore=/opt/nifi/config_resources/keys/truststore.jks
> nifi.security.truststoreType=JKS
> nifi.security.truststorePasswd=
>
>
>
> My authorizers.xml file is configured like this:
> 
> 
>   
>   
>   
>   
> file-user-group-provider
> org.apache.nifi.authorization.FileUserGroupProvider
> /opt/nifi/config_resources/users.xml
> CN=admin, OU=NIFI
>   
>   
> file-access-policy-provider
> org.apache.nifi.authorization.FileAccessPolicyProvider
> file-user-group-provider
> CN=admin, OU=NIFI
> /opt/nifi/config_resources/authorizations.xml
>   
>   
> managed-authorizer
> org.apache.nifi.authorization.StandardManagedAuthorizer
> file-access-policy-provider
>   
> 
>
>
>
> My Security Group on my ec2 instance has a rule to permit 8443 for my IP
> address.
>
>
>
> What have I overlooked? Thanks in advance for any help.
>
>
>
>
>
>


Re: Unable to securely connect to NiFi 2.0 instance

2024-04-18 Thread James McMahon
I started from scratch. Got nifi to start, no errors at all in my
nifi-app.log. Configured the client certs in my Chrome browser, also added
cacert.pem to my Root Trusted CAs.
Tried to hit https://ec2-44-219-227-80.compute-1.amazonaws.com:8443/nifi ,
continue to get rejected with this message from the browser:

This site can’t provide a secure
connectionec2-44-219-227-80.compute-1.amazonaws.com didn’t accept your
login certificate, or one may not have been provided.
Try contacting the system admin.
ERR_BAD_SSL_CLIENT_AUTH_CERT

I never get prompted to select a client cert.

Anyone have any thoughts - fixing, debugging, anything?

On Wed, Apr 17, 2024 at 8:44 PM James McMahon  wrote:

> I have installed and configured NiFi 2.0 with TLS. My nifi 2.0 instance
> appears to start without errors, judging by the contents of nifi-app.log.
>
> When I try to access my nifi instance through its https setting in
> nifi.properties, I get this error in my browser:
>
> This site can’t provide a secure connection
> ec2-44-219-227-80.compute-1.amazonaws.com didn’t accept your login
> certificate, or one may not have been provided.
> Try contacting the system admin.
> ERR_BAD_SSL_CLIENT_AUTH_CERT
>
> Normally I would expect to be prompted to select admin's login cert from
> the list of trusted certs. But I am not getting prompted - it just throws
> the error.
>
> I had employed tinycert.org to generate my cacert.pem, my server cert and
> private key, and a client cert and private key for my admin user.
>
> This is how I brought the server private key and cert into my keystore:
> openssl pkcs12 -export -out keystore.p12 -inkey ec2-44-219-227-80-key.pem
> -in ec2-44-219-227-80.pem -certfile cacert.pem
>
> This is how I imported my cacert into the nifi truststore with java
> keytool:
> keytool -import -alias "CACert" -file cacert.pem -keystore truststore.jks
> -storepass 
>
> This is how I converted my client cert and key, which I then added to my
> browser cert store:
> openssl pkcs12 -export -out admin.p12 -inkey admin-key.pem -in admin.pem
> -certfile cacert.pem
>
> I have configured the cacert in my nifi truststore.jks. I have the server
> cert and private key in my keystore.p12. (I had read that jks for one and
> p12 for the other is not an issue).
>
> I have installed the cert and private key for user admin in my Chrome
> browser. I also installed the cacert.pem CA in my browser trusted root
> store.
>
> Here are my keystore, truststore, and https params in nifi.properties:
> nifi.web.https.host=ec2-44-219-227-80.compute-1.amazonaws.com
> nifi.web.https.port=8443
> ...
> nifi.security.keystore=/opt/nifi/config_resources/keys/keystore.p12
> nifi.security.keystoreType=PKCS12
> nifi.security.keystorePasswd=<.>
> nifi.security.keyPasswd=<.>
> nifi.security.truststore=/opt/nifi/config_resources/keys/truststore.jks
> nifi.security.truststoreType=JKS
> nifi.security.truststorePasswd=
>
> My authorizers.xml file is configured like this:
> 
> 
>   
>   
>   
>   
> file-user-group-provider
> org.apache.nifi.authorization.FileUserGroupProvider
> /opt/nifi/config_resources/users.xml
> CN=admin, OU=NIFI
>   
>   
> file-access-policy-provider
> org.apache.nifi.authorization.FileAccessPolicyProvider
> file-user-group-provider
> CN=admin, OU=NIFI
> /opt/nifi/config_resources/authorizations.xml
>   
>   
> managed-authorizer
> org.apache.nifi.authorization.StandardManagedAuthorizer
> file-access-policy-provider
>   
> 
>
> My Security Group on my ec2 instance has a rule to permit 8443 for my IP
> address.
>
> What have I overlooked? Thanks in advance for any help.
>
>
>


Unable to securely connect to NiFi 2.0 instance

2024-04-17 Thread James McMahon
I have installed and configured NiFi 2.0 with TLS. My nifi 2.0 instance
appears to start without errors, judging by the contents of nifi-app.log.

When I try to access my nifi instance through its https setting in
nifi.properties, I get this error in my browser:

This site can’t provide a secure connection
ec2-44-219-227-80.compute-1.amazonaws.com didn’t accept your login
certificate, or one may not have been provided.
Try contacting the system admin.
ERR_BAD_SSL_CLIENT_AUTH_CERT

Normally I would expect to be prompted to select admin's login cert from
the list of trusted certs. But I am not getting prompted - it just throws
the error.

I had employed tinycert.org to generate my cacert.pem, my server cert and
private key, and a client cert and private key for my admin user.

This is how I brought the server private key and cert into my keystore:
openssl pkcs12 -export -out keystore.p12 -inkey ec2-44-219-227-80-key.pem
-in ec2-44-219-227-80.pem -certfile cacert.pem

This is how I imported my cacert into the nifi truststore with java keytool:
keytool -import -alias "CACert" -file cacert.pem -keystore truststore.jks
-storepass 

This is how I converted my client cert and key, which I then added to my
browser cert store:
openssl pkcs12 -export -out admin.p12 -inkey admin-key.pem -in admin.pem
-certfile cacert.pem

I have configured the cacert in my nifi truststore.jks. I have the server
cert and private key in my keystore.p12. (I had read that jks for one and
p12 for the other is not an issue).

I have installed the cert and private key for user admin in my Chrome
browser. I also installed the cacert.pem CA in my browser trusted root
store.

Here are my keystore, truststore, and https params in nifi.properties:
nifi.web.https.host=ec2-44-219-227-80.compute-1.amazonaws.com
nifi.web.https.port=8443
...
nifi.security.keystore=/opt/nifi/config_resources/keys/keystore.p12
nifi.security.keystoreType=PKCS12
nifi.security.keystorePasswd=<.>
nifi.security.keyPasswd=<.>
nifi.security.truststore=/opt/nifi/config_resources/keys/truststore.jks
nifi.security.truststoreType=JKS
nifi.security.truststorePasswd=

My authorizers.xml file is configured like this:


  
  
  
  
file-user-group-provider
org.apache.nifi.authorization.FileUserGroupProvider
/opt/nifi/config_resources/users.xml
CN=admin, OU=NIFI
  
  
file-access-policy-provider
org.apache.nifi.authorization.FileAccessPolicyProvider
file-user-group-provider
CN=admin, OU=NIFI
/opt/nifi/config_resources/authorizations.xml
  
  
managed-authorizer
org.apache.nifi.authorization.StandardManagedAuthorizer
file-access-policy-provider
  


My Security Group on my ec2 instance has a rule to permit 8443 for my IP
address.

What have I overlooked? Thanks in advance for any help.


How to upload flow definition to new 2.0 instance?

2024-04-10 Thread James McMahon
I had developed an extensive NiFi Flow in v1.16. I have initialized an
instance of NiFi 2.0.

I downloaded my flow to file NiFi_Flow.json from my 1.16 instance.
But I can find no way to import this or load this to my 2.0 instance.
We currently have no Registry.
How can I do this?


Re: JoltTransformJSON error from NiFi, but is valid in tester

2024-02-19 Thread James McMahon
Thank you Arvind. I removed the comment but the processor continues to fail
its configuration check with the same error.
Our nifi version is 1.16.3.

I experimented with other configuration properties of the processor and
seem to have hit on success. I had tried setting Jolt Transformation DSL to
Shift, then to Default, then to Modify - Default. I finally understood that
because I was executing two transforms - a Shift, then a Default - it
should probably be set to Chain. And that worked.

Thank you again.
--
Jim

On Mon, Feb 19, 2024 at 12:02 AM Arvind Singh Tomar <
arvind.to...@hotwaxsystems.com> wrote:

> Hi James,
>
> Sharing your Nifi version would help identify the issue.
>
> However, one thing which I encountered while working with Nifi version
> 1.23.2 that the processor does not like embedded comments in the
> transformations. I needed to remove all the comments.
>
> I tested yours by removing the comment and it seems to be working fine:
>
> [
>   {
> "operation": "shift",
> "spec": {
>   "ids": "ids",
>   "parents": "parents",
>   "dates": "dates",
>   "triage": "triage",
>   "payload": "payload"
> }
>   },
>   {
> "operation": "default",
> "spec": {
>   "payload": ""
> }
>   }
> ]
>
> Regards
> --
> Arvind Singh Tomar
>
>
> On Mon, Feb 19, 2024 at 8:49 AM James McMahon 
> wrote:
>
>> I have this JSON as flowfile content:
>>
>> {
>>   "dates" : {
>> "date_file" : "20240115184407",
>> "ingested" : "20240217175748",
>> "latest_date" : "1980",
>> "earliest_date" : "1980",
>> "date_info" : "MMDD"
>>   },
>>   "parents" : {
>> "md5" : "86107362084b86ea64dc33dfde5e14ff",
>> "sha256" :
>> "4ffe010f3392dddb3a880f0e60a6bf35e1e41444b7e83be0746097557d555e0f"
>>   },
>>   "triage" : {
>> "filetype" : "unstructured",
>> "languages" : "[\"gaelic\"]",
>> "filename" : "gaelicTest.txt"
>>   },
>>   "ids" : {
>> "md5" : "86107362084b86ea64dc33dfde5e14ff",
>> "sha256" :
>> "4ffe010f3392dddb3a880f0e60a6bf35e1e41444b7e83be0746097557d555e0f",
>> "sku" : "0"
>>   }
>> }
>>
>> I am trying to apply a simple JOLT transform that adds a key payload at
>> the same level in my json as keys dates, parents, triage, and ids.
>>
>> I developed this transform - just about as simple a case you could
>> possibly hope for:
>> [
>>   {
>> "operation": "shift",
>> "spec": {
>>   "ids": "ids",
>>   "parents": "parents",
>>   "dates": "dates",
>>   "triage": "triage",
>>   "payload": "payload"
>> }
>>   },
>>   {
>> "operation": "default",
>> "spec": {
>>   // if payload does not exist, then apply a default of null
>>   "payload": ""
>> }
>>   }
>> ]
>>
>> I tested this successfully here: https://jolt-demo.appspot.com/#inception
>> .
>> It works.
>> No big surprise - this is about as simple as it gets.
>>
>> Should work in nifi, right? Can't get much easier than this.
>>
>> But in nifi, it throws this error:
>>
>> JoltTransformJSON[id=4d6c3f69-a72e-16b2-8cfe-2a9adb9303c7] processor is not 
>> valid: : com.bazaarvoice.jolt.exception.SpecException: Shiftr expected a 
>> spec of Map type, got ArrayList
>>
>>
>> Why? How can I get this to work in NiFi?
>>
>>
>>
>>


JoltTransformJSON error from NiFi, but is valid in tester

2024-02-18 Thread James McMahon
I have this JSON as flowfile content:

{
  "dates" : {
"date_file" : "20240115184407",
"ingested" : "20240217175748",
"latest_date" : "1980",
"earliest_date" : "1980",
"date_info" : "MMDD"
  },
  "parents" : {
"md5" : "86107362084b86ea64dc33dfde5e14ff",
"sha256" :
"4ffe010f3392dddb3a880f0e60a6bf35e1e41444b7e83be0746097557d555e0f"
  },
  "triage" : {
"filetype" : "unstructured",
"languages" : "[\"gaelic\"]",
"filename" : "gaelicTest.txt"
  },
  "ids" : {
"md5" : "86107362084b86ea64dc33dfde5e14ff",
"sha256" :
"4ffe010f3392dddb3a880f0e60a6bf35e1e41444b7e83be0746097557d555e0f",
"sku" : "0"
  }
}

I am trying to apply a simple JOLT transform that adds a key payload at the
same level in my json as keys dates, parents, triage, and ids.

I developed this transform - just about as simple a case you could possibly
hope for:
[
  {
"operation": "shift",
"spec": {
  "ids": "ids",
  "parents": "parents",
  "dates": "dates",
  "triage": "triage",
  "payload": "payload"
}
  },
  {
"operation": "default",
"spec": {
  // if payload does not exist, then apply a default of null
  "payload": ""
}
  }
]

I tested this successfully here: https://jolt-demo.appspot.com/#inception .
It works.
No big surprise - this is about as simple as it gets.

Should work in nifi, right? Can't get much easier than this.

But in nifi, it throws this error:

JoltTransformJSON[id=4d6c3f69-a72e-16b2-8cfe-2a9adb9303c7] processor
is not valid: : com.bazaarvoice.jolt.exception.SpecException: Shiftr
expected a spec of Map type, got ArrayList


Why? How can I get this to work in NiFi?


Re: Can we access Queued Duration as an attribute?

2024-02-16 Thread James McMahon
Good to know - very helpful. Thank you both, Joe W. and Mark P.

On Thu, Feb 15, 2024 at 10:42 AM Joe Witt  wrote:

> This [1] blog seems amazingly appropriate and wow do we need these/any
> such fields we intend to truly honor in a prominent place in the docs.
> Super useful...
>
> [1] https://jameswing.net/nifi/nifi-internal-fields.html
>
> Thanks
>
> On Thu, Feb 15, 2024 at 8:35 AM Mark Payne  wrote:
>
>> Jim,
>>
>> You can actually reference “lastQueueDate” in Expression Language. It is
>> formatted as number of milliseconds since epoch.
>>
>> So you might have a RouteOnAttribute that has a property named “old” with
>> a value of:
>> ${lastQueueDate:lt( ${now():minus(1)} )}
>>
>> So any FlowFile that has been queued for more than 10 seconds would be
>> routed to “old”, anything else to “unmatched”
>>
>> Thanks
>> -Mark
>>
>>
>> On Feb 15, 2024, at 10:18 AM, James McMahon  wrote:
>>
>> That would work - what a good suggestion. I'll do that. I can format the
>> resulting number and then RouteOnAttribute by the desired subset of the
>> result.
>> Something like this to set attribute dt.failure:
>>
>> ${now():toNumber():toDate("-MM-ddHH:mm:ss"):format("MMddHHmmss","EST")}
>> Then I can effectively route the files.
>> Thank you Jim S.
>>
>> On Thu, Feb 15, 2024 at 9:55 AM Jim Steinebrey 
>> wrote:
>>
>>> You could add an UpdateAttribute processor first in the failure path to
>>> add a new attribute which contains the time the error occurred by using the
>>> ${now()} or ${now():toNumber()} expression language function.
>>>
>>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
>>>
>>> Then later on in the flow you can compare current time to the saved
>>> error time to see how much time has elapsed.
>>>
>>> — Jim
>>>
>>>
>>> On Feb 15, 2024, at 9:44 AM, James McMahon  wrote:
>>>
>>> As it turns out lineageStartDate and Queued Duration are very different.
>>> Without being able to get at Queued Duration as an attribute, it appears we
>>> cannot RouteOnAttribute to filter thousands in a queue by anything like
>>> hours they have been in queue.
>>> Why would this be helpful? Let us say we have an InvokeHttp processor
>>> making calls to a REST endpoint. We leave for a weekend and return to find
>>> 5000 files in the Failure queue from this processor. It would be most
>>> helpful to identify the start time and end time of these 5000 failures. We
>>> can't do that reviewing only the first 100 flowfiles in the queue from the
>>> UI.
>>> One can make an assumption that all of these 5000 flowfiles that failed
>>> InvokeHttp share a similar range of lineageStartDate, but that will not
>>> necessarily be true depending on flow complexity.
>>>
>>> On Wed, Feb 14, 2024 at 9:49 AM James McMahon 
>>> wrote:
>>>
>>>> What a great workaround, thank you once again Mike. I'll put this in
>>>> and use it now.
>>>> Jim
>>>>
>>>> On Tue, Feb 13, 2024 at 4:41 PM Michael Moser 
>>>> wrote:
>>>>
>>>>> Hello James,
>>>>>
>>>>> I'm not aware of a way to access Queued Duration using expression
>>>>> language, but you can access the Lineage Duration information.  The 
>>>>> Getting
>>>>> Started Guide mentions both entryDate and lineageStartDate as immutable
>>>>> attributes on all flowfiles.  These are numbers of milliseconds since
>>>>> epoch.  If you need them in a readable format, you can use the format()
>>>>> function.
>>>>>
>>>>> simple examples:
>>>>> ${entryDate} = 1707859943778
>>>>> ${lineageStartDate} = 1707859943778
>>>>> ${lineageStartDate:format("-MM-dd HH:mm:ss.SSS")} = 2024-02-13
>>>>> 21:32:23.778
>>>>>
>>>>> -- Mike
>>>>>
>>>>>
>>>>> On Mon, Feb 12, 2024 at 11:38 AM James McMahon 
>>>>> wrote:
>>>>>
>>>>>> When we examine the contents of a queue through the UI and select a
>>>>>> flowfile from the resulting list, we see FlowFile Details in the Details
>>>>>> tab. Are those key/values accessible from nifi expression language? I 
>>>>>> would
>>>>>> like to access Queued Duration. I have a queue that holds flowfiles with
>>>>>> non-successful return codes for calls to REST services, and I want to 
>>>>>> route
>>>>>> depending on how long these flowfiles have been sitting in my error queue
>>>>>> to isolate the window when the REST service was unavailable.
>>>>>> Thank you for any examples that show how we can access these keys and
>>>>>> values.
>>>>>>
>>>>>
>>>
>>


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-02-15 Thread James McMahon
This is proving to be difficult to do in practice. Many of the filenames in
the zip contain spaces and other characters, and these are failing to be
passed to the tar successfully.
This is the command I am testing at the command line to first extract the
filenames:
  unzip -l "budgets-state-govs (1).zip" | awk '$4 != "" {print $4}' |
egrep -v "^Name$" | egrep -v ".*.*"
Note that I am forced to weed out all the lines that do not contain
filenames. And I have not yet tried to get any case working where there are
recursive directories in the zip.

This yields partial names such as Virginia for files with names like
Virginia 2023.mdb
When I try to pass Virginia to the tar command following the pipe, tar
chokes on these and fails.

I will continue to hammer away at this. If I find a solution I can employ
in an ExecuteStreamCommand, I will circle back and post it.

On Fri, Feb 2, 2024 at 6:03 PM Michael Moser  wrote:

> Yes, that's exactly what those commands do.  Your linux commands like
> unzip and tar can probably read directly from /dev/stdin and write directly
> to /dev/stdout if you want to.
>
> -- Mike
>
>
> On Fri, Feb 2, 2024 at 9:22 AM James McMahon  wrote:
>
>> Hi Michael. This is a very clever approach: convert from a zip (which
>> UnpackContent does not preserve file metadata for extracted files) to a tar
>> (for which UnpackContent does preserve file metadata), then employ the
>> UnpackContent.
>>
>> One quick followup question. The ExecuteStreamCommand will be in the nifi
>> flow, and so its input will be streaming incoming flowfiles, and its output
>> will be streamed as a flowfile. Are these two commands in the script where
>> we capture the incoming flowfile
>>
>> cat /dev/stdin >> $tmpzipfile
>>
>> ...and where we create the output flowfile from the ExecuteStreamCommand
>> processor?
>>
>> cat $tmptarfile >> /dev/stdout
>>
>>
>> On Thu, Feb 1, 2024 at 10:11 AM Michael Moser  wrote:
>>
>>> Hi Jim,
>>>
>>> The ExecuteStreamCommand will only output 1 flowfile, so using it to
>>> unzip in this fashion won't yield the results you need.
>>>
>>> Instead, you might try a workaround with ExecuteStreamCommand to unzip
>>> your file and then tar to repackage it.  Then UnpackContent should be able
>>> to read the tar file metadata.  I have used ExecuteStreamCommand to execute
>>> bash scripts.  An example is shown below, which you can modify for your
>>> needs.  The ExecuteStreamCommand properties "Command Path=/bin/bash" and
>>> "Command Arguments=/path/to/script.sh" is all you need for this script to
>>> work.
>>>
>>> #!/bin/bash
>>> tmpzipfile=$(mktemp)
>>> tmptarfile=$(mktemp)
>>> #remove the tmptarfile file, we just need a temporary filename, and will
>>> recreate it below
>>> rm -f $tmptarfile
>>> #create a directory to unzip files to
>>> tmpdir=$(mktemp -d)
>>>
>>> cat /dev/stdin >> $tmpzipfile
>>> # here is your unzip command to unzip $tmpzipfile to $tmpdir, preserving
>>> file metadata
>>> # here is your tar command to tar $tmpdir to $tmptarfile
>>> cat $tmptarfile >> /dev/stdout
>>>
>>> #cleanup
>>> rm -f $tmpzipfile
>>> rm -f $tmptarfile
>>> rm -rf $tmpdir
>>>
>>>
>>>
>>> On Wed, Jan 31, 2024 at 12:55 PM James McMahon 
>>> wrote:
>>>
>>>> If anyone can show me how to get my ExecuteStreamCommand configured
>>>> properly as a workaround, I am still interested in that.
>>>> Jim
>>>>
>>>> On Wed, Jan 31, 2024 at 12:39 PM James McMahon 
>>>> wrote:
>>>>
>>>>> I tried to find a Create option for tickets here,
>>>>> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
>>>>> .
>>>>> I did not find one, and suspect maybe I have no such privilege perhaps?
>>>>> In any case, thank you for creating that.
>>>>> Jim
>>>>>
>>>>> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt  wrote:
>>>>>
>>>>>> I went ahead and wrote it up here
>>>>>> https://issues.apache.org/jira/browse/NIFI-12709
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon 
>>>>>> wrote:
>>>>>>
>>>>>>> Happy to do that Joe. How do I create and 

Re: Can we access Queued Duration as an attribute?

2024-02-15 Thread James McMahon
That would work - what a good suggestion. I'll do that. I can format the
resulting number and then RouteOnAttribute by the desired subset of the
result.
Something like this to set attribute dt.failure:
${now():toNumber():toDate("-MM-ddHH:mm:ss"):format("MMddHHmmss","EST")}
Then I can effectively route the files.
Thank you Jim S.

On Thu, Feb 15, 2024 at 9:55 AM Jim Steinebrey 
wrote:

> You could add an UpdateAttribute processor first in the failure path to
> add a new attribute which contains the time the error occurred by using the
> ${now()} or ${now():toNumber()} expression language function.
>
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#now
>
> Then later on in the flow you can compare current time to the saved error
> time to see how much time has elapsed.
>
> — Jim
>
>
> On Feb 15, 2024, at 9:44 AM, James McMahon  wrote:
>
> As it turns out lineageStartDate and Queued Duration are very different.
> Without being able to get at Queued Duration as an attribute, it appears we
> cannot RouteOnAttribute to filter thousands in a queue by anything like
> hours they have been in queue.
> Why would this be helpful? Let us say we have an InvokeHttp processor
> making calls to a REST endpoint. We leave for a weekend and return to find
> 5000 files in the Failure queue from this processor. It would be most
> helpful to identify the start time and end time of these 5000 failures. We
> can't do that reviewing only the first 100 flowfiles in the queue from the
> UI.
> One can make an assumption that all of these 5000 flowfiles that failed
> InvokeHttp share a similar range of lineageStartDate, but that will not
> necessarily be true depending on flow complexity.
>
> On Wed, Feb 14, 2024 at 9:49 AM James McMahon 
> wrote:
>
>> What a great workaround, thank you once again Mike. I'll put this in and
>> use it now.
>> Jim
>>
>> On Tue, Feb 13, 2024 at 4:41 PM Michael Moser  wrote:
>>
>>> Hello James,
>>>
>>> I'm not aware of a way to access Queued Duration using expression
>>> language, but you can access the Lineage Duration information.  The Getting
>>> Started Guide mentions both entryDate and lineageStartDate as immutable
>>> attributes on all flowfiles.  These are numbers of milliseconds since
>>> epoch.  If you need them in a readable format, you can use the format()
>>> function.
>>>
>>> simple examples:
>>> ${entryDate} = 1707859943778
>>> ${lineageStartDate} = 1707859943778
>>> ${lineageStartDate:format("-MM-dd HH:mm:ss.SSS")} = 2024-02-13
>>> 21:32:23.778
>>>
>>> -- Mike
>>>
>>>
>>> On Mon, Feb 12, 2024 at 11:38 AM James McMahon 
>>> wrote:
>>>
>>>> When we examine the contents of a queue through the UI and select a
>>>> flowfile from the resulting list, we see FlowFile Details in the Details
>>>> tab. Are those key/values accessible from nifi expression language? I would
>>>> like to access Queued Duration. I have a queue that holds flowfiles with
>>>> non-successful return codes for calls to REST services, and I want to route
>>>> depending on how long these flowfiles have been sitting in my error queue
>>>> to isolate the window when the REST service was unavailable.
>>>> Thank you for any examples that show how we can access these keys and
>>>> values.
>>>>
>>>
>


Re: Can we access Queued Duration as an attribute?

2024-02-15 Thread James McMahon
As it turns out lineageStartDate and Queued Duration are very different.
Without being able to get at Queued Duration as an attribute, it appears we
cannot RouteOnAttribute to filter thousands in a queue by anything like
hours they have been in queue.
Why would this be helpful? Let us say we have an InvokeHttp processor
making calls to a REST endpoint. We leave for a weekend and return to find
5000 files in the Failure queue from this processor. It would be most
helpful to identify the start time and end time of these 5000 failures. We
can't do that reviewing only the first 100 flowfiles in the queue from the
UI.
One can make an assumption that all of these 5000 flowfiles that failed
InvokeHttp share a similar range of lineageStartDate, but that will not
necessarily be true depending on flow complexity.

On Wed, Feb 14, 2024 at 9:49 AM James McMahon  wrote:

> What a great workaround, thank you once again Mike. I'll put this in and
> use it now.
> Jim
>
> On Tue, Feb 13, 2024 at 4:41 PM Michael Moser  wrote:
>
>> Hello James,
>>
>> I'm not aware of a way to access Queued Duration using expression
>> language, but you can access the Lineage Duration information.  The Getting
>> Started Guide mentions both entryDate and lineageStartDate as immutable
>> attributes on all flowfiles.  These are numbers of milliseconds since
>> epoch.  If you need them in a readable format, you can use the format()
>> function.
>>
>> simple examples:
>> ${entryDate} = 1707859943778
>> ${lineageStartDate} = 1707859943778
>> ${lineageStartDate:format("-MM-dd HH:mm:ss.SSS")} = 2024-02-13
>> 21:32:23.778
>>
>> -- Mike
>>
>>
>> On Mon, Feb 12, 2024 at 11:38 AM James McMahon 
>> wrote:
>>
>>> When we examine the contents of a queue through the UI and select a
>>> flowfile from the resulting list, we see FlowFile Details in the Details
>>> tab. Are those key/values accessible from nifi expression language? I would
>>> like to access Queued Duration. I have a queue that holds flowfiles with
>>> non-successful return codes for calls to REST services, and I want to route
>>> depending on how long these flowfiles have been sitting in my error queue
>>> to isolate the window when the REST service was unavailable.
>>> Thank you for any examples that show how we can access these keys and
>>> values.
>>>
>>


Re: Can we access Queued Duration as an attribute?

2024-02-14 Thread James McMahon
What a great workaround, thank you once again Mike. I'll put this in and
use it now.
Jim

On Tue, Feb 13, 2024 at 4:41 PM Michael Moser  wrote:

> Hello James,
>
> I'm not aware of a way to access Queued Duration using expression
> language, but you can access the Lineage Duration information.  The Getting
> Started Guide mentions both entryDate and lineageStartDate as immutable
> attributes on all flowfiles.  These are numbers of milliseconds since
> epoch.  If you need them in a readable format, you can use the format()
> function.
>
> simple examples:
> ${entryDate} = 1707859943778
> ${lineageStartDate} = 1707859943778
> ${lineageStartDate:format("-MM-dd HH:mm:ss.SSS")} = 2024-02-13
> 21:32:23.778
>
> -- Mike
>
>
> On Mon, Feb 12, 2024 at 11:38 AM James McMahon 
> wrote:
>
>> When we examine the contents of a queue through the UI and select a
>> flowfile from the resulting list, we see FlowFile Details in the Details
>> tab. Are those key/values accessible from nifi expression language? I would
>> like to access Queued Duration. I have a queue that holds flowfiles with
>> non-successful return codes for calls to REST services, and I want to route
>> depending on how long these flowfiles have been sitting in my error queue
>> to isolate the window when the REST service was unavailable.
>> Thank you for any examples that show how we can access these keys and
>> values.
>>
>


Can we access Queued Duration as an attribute?

2024-02-12 Thread James McMahon
When we examine the contents of a queue through the UI and select a
flowfile from the resulting list, we see FlowFile Details in the Details
tab. Are those key/values accessible from nifi expression language? I would
like to access Queued Duration. I have a queue that holds flowfiles with
non-successful return codes for calls to REST services, and I want to route
depending on how long these flowfiles have been sitting in my error queue
to isolate the window when the REST service was unavailable.
Thank you for any examples that show how we can access these keys and
values.


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-02-02 Thread James McMahon
Hi Michael. This is a very clever approach: convert from a zip (which
UnpackContent does not preserve file metadata for extracted files) to a tar
(for which UnpackContent does preserve file metadata), then employ the
UnpackContent.

One quick followup question. The ExecuteStreamCommand will be in the nifi
flow, and so its input will be streaming incoming flowfiles, and its output
will be streamed as a flowfile. Are these two commands in the script where
we capture the incoming flowfile

cat /dev/stdin >> $tmpzipfile

...and where we create the output flowfile from the ExecuteStreamCommand
processor?

cat $tmptarfile >> /dev/stdout


On Thu, Feb 1, 2024 at 10:11 AM Michael Moser  wrote:

> Hi Jim,
>
> The ExecuteStreamCommand will only output 1 flowfile, so using it to unzip
> in this fashion won't yield the results you need.
>
> Instead, you might try a workaround with ExecuteStreamCommand to unzip
> your file and then tar to repackage it.  Then UnpackContent should be able
> to read the tar file metadata.  I have used ExecuteStreamCommand to execute
> bash scripts.  An example is shown below, which you can modify for your
> needs.  The ExecuteStreamCommand properties "Command Path=/bin/bash" and
> "Command Arguments=/path/to/script.sh" is all you need for this script to
> work.
>
> #!/bin/bash
> tmpzipfile=$(mktemp)
> tmptarfile=$(mktemp)
> #remove the tmptarfile file, we just need a temporary filename, and will
> recreate it below
> rm -f $tmptarfile
> #create a directory to unzip files to
> tmpdir=$(mktemp -d)
>
> cat /dev/stdin >> $tmpzipfile
> # here is your unzip command to unzip $tmpzipfile to $tmpdir, preserving
> file metadata
> # here is your tar command to tar $tmpdir to $tmptarfile
> cat $tmptarfile >> /dev/stdout
>
> #cleanup
> rm -f $tmpzipfile
> rm -f $tmptarfile
> rm -rf $tmpdir
>
>
>
> On Wed, Jan 31, 2024 at 12:55 PM James McMahon 
> wrote:
>
>> If anyone can show me how to get my ExecuteStreamCommand configured
>> properly as a workaround, I am still interested in that.
>> Jim
>>
>> On Wed, Jan 31, 2024 at 12:39 PM James McMahon 
>> wrote:
>>
>>> I tried to find a Create option for tickets here,
>>> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
>>> .
>>> I did not find one, and suspect maybe I have no such privilege perhaps?
>>> In any case, thank you for creating that.
>>> Jim
>>>
>>> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt  wrote:
>>>
>>>> I went ahead and wrote it up here
>>>> https://issues.apache.org/jira/browse/NIFI-12709
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon 
>>>> wrote:
>>>>
>>>>> Happy to do that Joe. How do I create and submit a JIRA for
>>>>> consideration? I have not done one - at least, not for years.
>>>>> If you get me started, I will do a concise and thorough description in
>>>>> the ticket.
>>>>> Sincerely,
>>>>> Jim
>>>>>
>>>>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt  wrote:
>>>>>
>>>>>> James,
>>>>>>
>>>>>> Makes sense to create a JIRA to improve UnpackContent to extract
>>>>>> these attributes in the event of a zip file that happens to present them.
>>>>>> The concept of lastModifiedDate does appear easily accessed if available 
>>>>>> in
>>>>>> the metadata.  Owner/Creator/Creation information looks less standard in
>>>>>> the case of a Zip but perhaps still capturable as extra fields.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon 
>>>>>> wrote:
>>>>>>
>>>>>>> I tried to use UnpackContent to extract the files within a zip file
>>>>>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>>>>>
>>>>>>> UnpackContent seemed to work, but it did not preserve file
>>>>>>> attributes from the files in the zip. For example, the
>>>>>>> lastModifiedTime   is not available so downstream I am unable to do
>>>>>>> this: 
>>>>>>> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>>>>>>>
>>>>>>> I did some digging and found that on the UnpackConten

Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread James McMahon
If anyone can show me how to get my ExecuteStreamCommand configured
properly as a workaround, I am still interested in that.
Jim

On Wed, Jan 31, 2024 at 12:39 PM James McMahon  wrote:

> I tried to find a Create option for tickets here,
> https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
> .
> I did not find one, and suspect maybe I have no such privilege perhaps?
> In any case, thank you for creating that.
> Jim
>
> On Wed, Jan 31, 2024 at 12:37 PM Joe Witt  wrote:
>
>> I went ahead and wrote it up here
>> https://issues.apache.org/jira/browse/NIFI-12709
>>
>> Thanks
>>
>> On Wed, Jan 31, 2024 at 10:30 AM James McMahon 
>> wrote:
>>
>>> Happy to do that Joe. How do I create and submit a JIRA for
>>> consideration? I have not done one - at least, not for years.
>>> If you get me started, I will do a concise and thorough description in
>>> the ticket.
>>> Sincerely,
>>> Jim
>>>
>>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt  wrote:
>>>
>>>> James,
>>>>
>>>> Makes sense to create a JIRA to improve UnpackContent to extract these
>>>> attributes in the event of a zip file that happens to present them.  The
>>>> concept of lastModifiedDate does appear easily accessed if available in the
>>>> metadata.  Owner/Creator/Creation information looks less standard in the
>>>> case of a Zip but perhaps still capturable as extra fields.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon 
>>>> wrote:
>>>>
>>>>> I tried to use UnpackContent to extract the files within a zip file
>>>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>>>
>>>>> UnpackContent seemed to work, but it did not preserve file attributes
>>>>> from the files in the zip. For example, the  lastModifiedTime   is not
>>>>> available so downstream I am unable to do
>>>>> this: 
>>>>> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>>>>>
>>>>> I did some digging and found that on the UnpackContent page, it says:
>>>>> file.lastModifiedTime  "The date and time that the unpacked file was
>>>>> last modified (*tar only*)."
>>>>>
>>>>> I need these file attributes for those files I extract from the zip.
>>>>> So as an alternative I tried configuring an ExecuteStreamCommand
>>>>> processor like this:
>>>>> Command Arguments  -c;"unzip -p -q < -"
>>>>> Command Path  /bin/bash
>>>>> Argument Delimiter   ;
>>>>>
>>>>> It throws these errors:
>>>>>
>>>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>>>>> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
>>>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
>>>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
>>>>> command /bin/bash ended in an error: /bin/bash: -: No such file or 
>>>>> directory
>>>>>
>>>>> It does not seem to be applying the unzip to the stdin of the ESC
>>>>> processor. None of the files in the zip archive are output from ESC.
>>>>>
>>>>> What needs to be changed in my ESC configuration?
>>>>>
>>>>> Thank you in advance for any help.
>>>>>
>>>>>


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread James McMahon
I tried to find a Create option for tickets here,
https://issues.apache.org/jira/projects/NIFI/issues/NIFI-11859?filter=allopenissues
.
I did not find one, and suspect maybe I have no such privilege perhaps?
In any case, thank you for creating that.
Jim

On Wed, Jan 31, 2024 at 12:37 PM Joe Witt  wrote:

> I went ahead and wrote it up here
> https://issues.apache.org/jira/browse/NIFI-12709
>
> Thanks
>
> On Wed, Jan 31, 2024 at 10:30 AM James McMahon 
> wrote:
>
>> Happy to do that Joe. How do I create and submit a JIRA for
>> consideration? I have not done one - at least, not for years.
>> If you get me started, I will do a concise and thorough description in
>> the ticket.
>> Sincerely,
>> Jim
>>
>> On Wed, Jan 31, 2024 at 12:12 PM Joe Witt  wrote:
>>
>>> James,
>>>
>>> Makes sense to create a JIRA to improve UnpackContent to extract these
>>> attributes in the event of a zip file that happens to present them.  The
>>> concept of lastModifiedDate does appear easily accessed if available in the
>>> metadata.  Owner/Creator/Creation information looks less standard in the
>>> case of a Zip but perhaps still capturable as extra fields.
>>>
>>> Thanks
>>>
>>> On Wed, Jan 31, 2024 at 10:01 AM James McMahon 
>>> wrote:
>>>
>>>> I tried to use UnpackContent to extract the files within a zip file
>>>> named ABC DEF (1).zip. (the filename has spaces in its name).
>>>>
>>>> UnpackContent seemed to work, but it did not preserve file attributes
>>>> from the files in the zip. For example, the  lastModifiedTime   is not
>>>> available so downstream I am unable to do
>>>> this: 
>>>> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>>>>
>>>> I did some digging and found that on the UnpackContent page, it says:
>>>> file.lastModifiedTime  "The date and time that the unpacked file was
>>>> last modified (*tar only*)."
>>>>
>>>> I need these file attributes for those files I extract from the zip. So
>>>> as an alternative I tried configuring an ExecuteStreamCommand
>>>> processor like this:
>>>> Command Arguments  -c;"unzip -p -q < -"
>>>> Command Path  /bin/bash
>>>> Argument Delimiter   ;
>>>>
>>>> It throws these errors:
>>>>
>>>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>>>> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
>>>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>>>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
>>>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
>>>> command /bin/bash ended in an error: /bin/bash: -: No such file or 
>>>> directory
>>>>
>>>> It does not seem to be applying the unzip to the stdin of the ESC
>>>> processor. None of the files in the zip archive are output from ESC.
>>>>
>>>> What needs to be changed in my ESC configuration?
>>>>
>>>> Thank you in advance for any help.
>>>>
>>>>


Re: ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread James McMahon
Happy to do that Joe. How do I create and submit a JIRA for consideration?
I have not done one - at least, not for years.
If you get me started, I will do a concise and thorough description in the
ticket.
Sincerely,
Jim

On Wed, Jan 31, 2024 at 12:12 PM Joe Witt  wrote:

> James,
>
> Makes sense to create a JIRA to improve UnpackContent to extract these
> attributes in the event of a zip file that happens to present them.  The
> concept of lastModifiedDate does appear easily accessed if available in the
> metadata.  Owner/Creator/Creation information looks less standard in the
> case of a Zip but perhaps still capturable as extra fields.
>
> Thanks
>
> On Wed, Jan 31, 2024 at 10:01 AM James McMahon 
> wrote:
>
>> I tried to use UnpackContent to extract the files within a zip file named
>> ABC DEF (1).zip. (the filename has spaces in its name).
>>
>> UnpackContent seemed to work, but it did not preserve file attributes
>> from the files in the zip. For example, the  lastModifiedTime   is not
>> available so downstream I am unable to do
>> this: 
>> ${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}
>>
>> I did some digging and found that on the UnpackContent page, it says:
>> file.lastModifiedTime  "The date and time that the unpacked file was
>> last modified (*tar only*)."
>>
>> I need these file attributes for those files I extract from the zip. So
>> as an alternative I tried configuring an ExecuteStreamCommand processor
>> like this:
>> Command Arguments  -c;"unzip -p -q < -"
>> Command Path  /bin/bash
>> Argument Delimiter   ;
>>
>> It throws these errors:
>>
>> 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
>> write flow file to stdin due to Broken pipe: java.io.IOException: Broken
>> pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
>> ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
>> flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
>> command /bin/bash ended in an error: /bin/bash: -: No such file or directory
>>
>> It does not seem to be applying the unzip to the stdin of the ESC
>> processor. None of the files in the zip archive are output from ESC.
>>
>> What needs to be changed in my ESC configuration?
>>
>> Thank you in advance for any help.
>>
>>


ExecuteStreamCommand failing to unzip incoming flowfiles

2024-01-31 Thread James McMahon
I tried to use UnpackContent to extract the files within a zip file named
ABC DEF (1).zip. (the filename has spaces in its name).

UnpackContent seemed to work, but it did not preserve file attributes from
the files in the zip. For example, the  lastModifiedTime   is not available
so downstream I am unable to do
this: 
${file.lastModifiedTime:toDate("-MM-dd'T'HH:mm:ssZ"):format("MMddHHmmss")}

I did some digging and found that on the UnpackContent page, it says:
file.lastModifiedTime  "The date and time that the unpacked file was last
modified (*tar only*)."

I need these file attributes for those files I extract from the zip. So as
an alternative I tried configuring an ExecuteStreamCommand processor like
this:
Command Arguments  -c;"unzip -p -q < -"
Command Path  /bin/bash
Argument Delimiter   ;

It throws these errors:

16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Failed to
write flow file to stdin due to Broken pipe: java.io.IOException: Broken
pipe 16:41:30 UTCERROR13023d28-6154-17fd-b4e8-7a30b35980ca
ExecuteStreamCommand[id=13023d28-6154-17fd-b4e8-7a30b35980ca] Transferring
flow file FlowFile[filename=ABC DEF (1).zip] to nonzero status. Executable
command /bin/bash ended in an error: /bin/bash: -: No such file or directory

It does not seem to be applying the unzip to the stdin of the ESC
processor. None of the files in the zip archive are output from ESC.

What needs to be changed in my ESC configuration?

Thank you in advance for any help.


Re: Error on InvokeHTTP

2024-01-13 Thread James McMahon
Juan, to do as you suggest would I set Content-Encoding in the processor
configuration to utf-8? Does that achieve this?

Joe O., I don't see any property in the Properties of the InvokeHTTP
processor that allows me to expect and handle spaces in the header. I also
don't explicitly create a header: I GenerateFlowFile with a body that is
english text.  I IdentifyMimeType. I send that through InvokeHTTP.

On Fri, Jan 12, 2024 at 5:04 PM Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> it seems charset issue. if it is a json add charset=utf-8
>
> On Fri, Jan 12, 2024, 6:33 PM James McMahon  wrote:
>
>> I have a text flowfile that I am trying to send to a translation service
>> on a remote EC2 instance from my nifi insurance on my EC2. I am failing
>> with only this somewhat-cryptic error:
>>
>> InvokeHTTP[id=a72e1727-3da0-1d6c-164b-e43c1426fd97] Routing to Failure
>> due to exception: Unexpected char 0x20 at 6 in header name: Socket Write
>> Timeout: java.lang.IllegalArgumentException: Unexpected char 0x20 at 6 in
>> header name: Socket Write Timeout
>>
>>
>> What does this mean? Is what I am sending from InvokeHTTP employing a header 
>> formatted in a way that is not expected?
>>
>>
>> I am using an InvokeHTTP version 1.16.3.
>>
>> Has anyone experienced a similar error?
>>
>>
>>
>>
>>


Error on InvokeHTTP

2024-01-12 Thread James McMahon
I have a text flowfile that I am trying to send to a translation service on
a remote EC2 instance from my nifi insurance on my EC2. I am failing with
only this somewhat-cryptic error:

InvokeHTTP[id=a72e1727-3da0-1d6c-164b-e43c1426fd97] Routing to Failure due
to exception: Unexpected char 0x20 at 6 in header name: Socket Write
Timeout: java.lang.IllegalArgumentException: Unexpected char 0x20 at 6 in
header name: Socket Write Timeout


What does this mean? Is what I am sending from InvokeHTTP employing a
header formatted in a way that is not expected?


I am using an InvokeHTTP version 1.16.3.

Has anyone experienced a similar error?


Re: Extract from jars and nars

2024-01-01 Thread James McMahon
It does indeed work perfectly, thanks very much Matt. Digging into the
errors I was seeing, it turned out that the jar and nar archive files were
being mangled by me in a prior flow step while trying to parse file
signatures out of the file headers. I mangled the headers and so corrupted
my jars. When I fixed that problem and tried what you said, it worked very
well. Thank you.
Cheers,
Jim

On Sun, Dec 31, 2023 at 6:49 PM Matt Burgess  wrote:

> Jim,
>
> When you say you want to "avoid having to output them to a temp
> directory", does that include the content repo? If not you can use
> UnpackContent with a Packaging Type of zip. I tried on both JARs and
> NARs and it works.
>
> Regards,
> Matt
>
> On Sun, Dec 31, 2023 at 12:37 PM James McMahon 
> wrote:
> >
> > I have a NiFi flow that handles many jar and nar archive files as
> incoming flowfiles. I am trying to figure out a way I can extract files
> from these archives - for example, in most cases one incoming jar has a
> number of files in its archive. So one flowfile should yield N output
> flowfiles if there are N files in the archive.
> >
> > I do not have /usr/bin/jar on my system. I have read, though, that unzip
> can be employed to extract from jars, and I have that. So I am trying to
> use that.
> >
> > How can I configure an ExecuteStreamCommand processor to take an
> incoming flowfile as stdin, and output each member of the archive as one of
> N output flowfiles to stdout? Ideally I want to avoid having to output my
> streaming flowfile to a temporary  physical directory; I want to perform
> the extraction entirely in stream.
> >
> > I have used ExecuteStreamCommand before but can't recall how to get it
> to work for this use case.
> >
> > Thanks for any help.
>


Extract from jars and nars

2023-12-31 Thread James McMahon
I have a NiFi flow that handles many jar and nar archive files as incoming
flowfiles. I am trying to figure out a way I can extract files from these
archives - for example, in most cases one incoming jar has a number of
files in its archive. So one flowfile should yield N output flowfiles if
there are N files in the archive.

I do not have /usr/bin/jar on my system. I have read, though, that unzip
can be employed to extract from jars, and I have that. So I am trying to
use that.

How can I configure an ExecuteStreamCommand processor to take an incoming
flowfile as stdin, and output each member of the archive as one of N output
flowfiles to stdout? Ideally I want to avoid having to output my streaming
flowfile to a temporary  physical directory; I want to perform the
extraction entirely in stream.

I have used ExecuteStreamCommand before but can't recall how to get it to
work for this use case.

Thanks for any help.


Re: Configuring ExecuteStreamCommand on jar flowfiles

2023-12-03 Thread James McMahon
UnpackContent examples seem to require that I output the results of the
unpack to a directory outside of the nifi flow. Is it possible to unpack
the jar in the flow, keeping the results as new flowfiles in the output
stream?

On Sun, Dec 3, 2023 at 1:23 PM James Srinivasan 
wrote:

> Since a jar file is mostly just a standard zip file, can you use a built
> in processor instead?
>
> On Sun, 3 Dec 2023, 15:36 James McMahon,  wrote:
>
>> I have a large volume of a wide variety of incoming data files. A subset
>> of these are jar files. Can the ExecuteStreamCommand be configured to run
>> the equivalent of
>>
>> jar -xf ${flowfile}
>>
>> and will that automatically direct each output file to a new flowfile, or
>> does ESC need to be told to direct each output file from jar standard out
>> to the Success path out of ESC?
>>
>> Thank you in advance for any assistance.
>>
>


Configuring ExecuteStreamCommand on jar flowfiles

2023-12-03 Thread James McMahon
I have a large volume of a wide variety of incoming data files. A subset of
these are jar files. Can the ExecuteStreamCommand be configured to run the
equivalent of

jar -xf ${flowfile}

and will that automatically direct each output file to a new flowfile, or
does ESC need to be told to direct each output file from jar standard out
to the Success path out of ESC?

Thank you in advance for any assistance.


Apache Tike compatible with NiFi 1.16.3

2023-11-12 Thread James McMahon
Where can I look to determine which Apache Tika version should be
downloaded to add to my lib directory for my Apache NiFi 1.16.3
installation?


ExecuteStreamCommand fails to run file against flowfile

2023-11-04 Thread James McMahon
I am having some difficulty getting the file command to run from
ExecuteStreamCommand, and am hoping someone can see my error.

At my linux command line,
file --brief myfile.mdb
returns
Microsoft Access Database
as expected.

I am reading files into a nifi flow, and trying to apply
ExecuteStreamCommand to run the file command against them. I configured my
processor like this:
Command Arguments  --brief;-
Command Path/usr/bin/file
Ignore STDIN   false
Argument Delimiter  ;
Output destination attribute  fType
Max Attribute Length  256

fType gets the value  data
for the mdb files, not Microsoft Access Database

How can I correct this? Can I configure my processor differently to get the
file command to return the expected result? I'm hoping to avoid writing
flowfile out to a /tmp directory by figuring out how to get file to run
against the stream.
Thank you in advance.


Re: curl from ExecuteStreamCommand

2023-10-22 Thread James McMahon
This looks promising, Lehel. Visiting my OpenSearch home page in the AWS
dashboard, I do see that I have an IAM role associated with it. That role
is AWSServiceRoleFoirAmazonOpenSearchService.

I select that role, but don't see that I have an ACCESS_KEY or SECRET_KEY
associated with it. Its Type is AWS Managed. Looking at the top of the
dashboard for this role, it appears that the role has a REGION of Global.
My OpenSearch Service has a REGION of US East (N. Virginia).

When I created my OpenSearch service, I did so only with a master user name
and master user password. Should I instead explicitly create access and
secret keys for my Amazon OpenSearch Service? Can you say a few words
regarding how I get to these keys?

On Sat, Oct 21, 2023 at 9:51 PM Lehel Boér  wrote:

> Hi James,
>
> I'm not sure if the username/password authentication is enough in this
> case. The AWS CLI automatically handles the authentication and
> authorization for you, using the credentials you have configured for your
> CLI. This looks like an authorization issue between curl and the AWS
> service.
> Curl supports *--aws-sigv-4* requests which you can use with the access
> key of the IAM role set in OpenSearch. I managed to get it working for GET
> requests.
> https://how.wtf/aws-sigv4-requests-with-curl.html
>
> -XGET;https://$DOMAIN/$PATH;-H;'Content-Type:
> application/json';--user;$ACCES_KEY:$SECRET_KEY;--aws-sigv4;aws:amz:$REGION:es
>
> Kind Regards,
> Lehel
>
> --
> *From:* James McMahon 
> *Sent:* Saturday, October 21, 2023 20:25
> *To:* users 
> *Subject:* curl from ExecuteStreamCommand
>
> I have tested this curl from my ec2 command line:
> curl -XPUT -u 'myusernm:myuserpw' '
> https://vpc-rampart-test-opensearch-nrqyb7jjpvmji6cp2qcvmyhcgq.us-east-1.es.amazonaws.com/movies/_doc/1'
> -d '{"director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996,
> "actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"],
> "title": "Mars Attacks!"}' -H 'Content-Type: application/json'
>
> It successfully puts the json into my Amazon OpenSearch domain.
>
> That domain in the URL above is the Domain endpoint shown on the AWS
> dashboard for its OpenSearch service.
>
> In NiFi the JSON is my flowfile content. I am trying to get my
> ExecuteStreamCommand to run the curl, but it fails. NiFi indicates it gets
> back this:
>   "message": "Request forbidden by administrative rules"
>
> This is how I have the processor configured.
> Command Arguments - -XINPUT;-u;myusernm:myuserpw;
> https://vpc-rampart-test-opensearch-nrqyb7jjpvmji6cp2qcvmyhcgq.us-east-1.es.amazonaws.com/movies/_doc/1;-H;'Content-Type:
> application/json'
> Command Path - /usr/bin/curl
> Ignore STDIN - false
> Argument Delimiter - ;
> Max Attribute Length - 256
>
> How can this be configured in the ExecuteStreamCommand processor to run
> successfully?
>
> If the ExecuteStreamCommand executes the command just as if we were at the
> command line, what is getting in the way here when I try to run this from
> NiFi?
>


curl from ExecuteStreamCommand

2023-10-21 Thread James McMahon
I have tested this curl from my ec2 command line:
curl -XPUT -u 'myusernm:myuserpw' '
https://vpc-rampart-test-opensearch-nrqyb7jjpvmji6cp2qcvmyhcgq.us-east-1.es.amazonaws.com/movies/_doc/1'
-d '{"director": "Burton, Tim", "genre": ["Comedy","Sci-Fi"], "year": 1996,
"actor": ["Jack Nicholson","Pierce Brosnan","Sarah Jessica Parker"],
"title": "Mars Attacks!"}' -H 'Content-Type: application/json'

It successfully puts the json into my Amazon OpenSearch domain.

That domain in the URL above is the Domain endpoint shown on the AWS
dashboard for its OpenSearch service.

In NiFi the JSON is my flowfile content. I am trying to get my
ExecuteStreamCommand to run the curl, but it fails. NiFi indicates it gets
back this:
  "message": "Request forbidden by administrative rules"

This is how I have the processor configured.
Command Arguments - -XINPUT;-u;myusernm:myuserpw;
https://vpc-rampart-test-opensearch-nrqyb7jjpvmji6cp2qcvmyhcgq.us-east-1.es.amazonaws.com/movies/_doc/1;-H;'Content-Type:
application/json'
Command Path - /usr/bin/curl
Ignore STDIN - false
Argument Delimiter - ;
Max Attribute Length - 256

How can this be configured in the ExecuteStreamCommand processor to run
successfully?

If the ExecuteStreamCommand executes the command just as if we were at the
command line, what is getting in the way here when I try to run this from
NiFi?


How to configure ConvertExcelToCSVProcessor processor?

2023-09-28 Thread James McMahon
I have an incoming xlsx file, with many sheets. I am trying to use
ConvertExcelToCSVProcessor processor to extract the sheets. It is currently
erroring when it extracts the header.

Here is what the header is in one of the sheets (commas added by me for
clarity):

Order,FAO,ISO3,Country/territory,Region,Sub-Region,Inc I,Inc II,Notes
I,Notes
II,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,,1961

This is what it should be:
Order,FAO,ISO3,Country/territory,Region,Sub-Region,Inc I,Inc II,Notes
I,Notes
II,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,
*2020,,1961*
,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020

Note that there is a gap column between the two sections - one section
provides data in one unit of measure, the second in hectares of arable land.

This is how I have my Convert processor configured:
Sheets to Extract No value
setNumber of Rows to Skip  0
Columns To Skip   No value
setFormat Cell Values   true
CSV Format  Microsoft Excel
Include Header Line  true

How must I tune this configuration to successfully extract all the header
fields? And why would it make it past that gap column, only to choke after
the first column of the second section? It seems that if it was going to
give up the ghost, it would do it when it encounters that gap column.

How does this processor identify the row that is the header? I'd like to
know what it does in case I need to write my own Groovy script to pull this
data. What characteristic does it use? What heuristic does it apply to skip
all the comment rows that sometimes precede the header row.

Thank you.


Re: How can I View my flowfile records?

2023-09-25 Thread James McMahon
A couple of random questions about ConvertExcelToCSVProcessor:

Why does this processor only handle the xlsx Excel file format?  From the
Description for ConvertExcelToCSVProcessor:  "*This processor is currently
only capable of processing .xlsx (XSSF 2007 OOXML file format) Excel
documents and not older .xls (HSSF '97(-2007) file format) documents.*" I
ask because it seems unfortunate to have to develop a separate distinct
flow path to handle the .xls files that this native processor cannot. Why
was it that handling of xls Excel files was not baked into
ConvertExcelToCSVProcessor too? Do later releases lift this limitation?

What is it about this processor that required including the word Processor
in its name? It seems redundant and inconsistent with the naming convention
used for the majority of the other processors. I figure there was an
interesting reason behind this, and so wanted to ask.

I am using a slightly older version of NiFi. Does this limitation go away
in later versions?

On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson 
wrote:

> I completely missed the fact that this was an external python conversion
> script through the ExecuteStreamCommand, but as Matt says, that will be
> catered for in the new NiFi versions.
>
> From a quick look, although I've not tested to confirm, it appears both
> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can
> now be paired with the relatively new ExcelReader,  e.g. in a ConvertRecord
> processor) will both set the result flowfile's mime.type attribute as
> text/csv, which would allow the expected downstream content viewer
> behaviour.
>
> On Mon, 25 Sept 2023, 06:54 Matt Burgess,  wrote:
>
>> I added MIME Type properties to ExecuteProcess and ExecuteStream command
>> so you can set it explicitly if you want [1]. They will be in the 1.24.0
>> and 2.0 releases.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-12011
>>
>>
>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt  wrote:
>>
>>>  Chris
>>>
>>> Yep. Though this case was ExecuteStreamCommand so following with
>>> UpdateAttr as you mention or IdentifyMimeType would do the trick.
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson <
>>> chris.samp...@naimuri.com> wrote:
>>>
>>>> An UpdateAttribute could also be used to update the mime.type, e.g. to
>>>> text/csv.
>>>>
>>>> I'd think the csv record writer should probably do this automatically
>>>> though, so maybe worth a jira to correct that (I'm reasonably sure the
>>>> existing json and avro writers do that, for example).
>>>>
>>>> On Sun, 24 Sept 2023, 23:52 James McMahon, 
>>>> wrote:
>>>>
>>>>> That was it. I was missing the forest for the trees, yet again .
>>>>> I do all the hard work and then forget to IdentifyMimeType at the end.
>>>>> Thanks very much Joe.
>>>>> Jim
>>>>>
>>>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:
>>>>>
>>>>>> Jim,
>>>>>>
>>>>>> Before you try to view it you can likely run it through
>>>>>> IdentifyMimeType.  As you note the conversion from XLS to CSV happens but
>>>>>> we still see a mime type of 'application/vnd.
>>>>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is likely
>>>>>> causing it to not even attempt to display.  So after your python script
>>>>>> execution run the data through IdentifyMimeType then you can likely view 
>>>>>> it
>>>>>> just fine.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon 
>>>>>> wrote:
>>>>>>
>>>>>>> I sure can Joe. Here they are:
>>>>>>>
>>>>>>> RouteOnAttribute.Route
>>>>>>> isExcel
>>>>>>> execution.command
>>>>>>> /usr/bin/python3
>>>>>>> execution.command.args
>>>>>>> /opt/nifi/config_resources/scripts/excelToCSV.py
>>>>>>> execution.error
>>>>>>> Empty string set
>>>>>>> execution.status
>>>>>>> 0
>>>>>>> filename
>>>>>>> Alltables.csv
>>>>>>> hash.value.md5
>>>>>>> b48840c161b645a0169e622dcb8f5083
>>>&

Re: How can I View my flowfile records?

2023-09-24 Thread James McMahon
That was it. I was missing the forest for the trees, yet again . I do
all the hard work and then forget to IdentifyMimeType at the end.
Thanks very much Joe.
Jim

On Sun, Sep 24, 2023 at 6:30 PM Joe Witt  wrote:

> Jim,
>
> Before you try to view it you can likely run it through IdentifyMimeType.
> As you note the conversion from XLS to CSV happens but we still see a mime
> type of 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
> so that is likely causing it to not even attempt to display.  So after your
> python script execution run the data through IdentifyMimeType then you can
> likely view it just fine.
>
> Thanks
>
> On Sun, Sep 24, 2023 at 3:21 PM James McMahon 
> wrote:
>
>> I sure can Joe. Here they are:
>>
>> RouteOnAttribute.Route
>> isExcel
>> execution.command
>> /usr/bin/python3
>> execution.command.args
>> /opt/nifi/config_resources/scripts/excelToCSV.py
>> execution.error
>> Empty string set
>> execution.status
>> 0
>> filename
>> Alltables.csv
>> hash.value.md5
>> b48840c161b645a0169e622dcb8f5083
>> hash.value.sha256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> isChild
>> false
>> mime.extension
>> .xlsx
>> mime.type
>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
>> parent.MD5
>> b48840c161b645a0169e622dcb8f5083
>> parent.SHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> path
>> ./
>> s3.bucket
>> rampart-raw-data
>> s3.encryptionStrategy
>> SSE_S3
>> s3.etag
>> b48840c161b645a0169e622dcb8f5083
>> s3.isLatest
>> true
>> s3.lastModified
>> 1672701227000
>> s3.length
>> 830934
>> s3.owner
>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
>> s3.sseAlgorithm
>> AES256
>> s3.storeClass
>> STANDARD
>> s3.version
>> null
>> sourcing.MD5
>> b48840c161b645a0169e622dcb8f5083
>> sourcing.SHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> sourcing.sourceMD5
>> b48840c161b645a0169e622dcb8f5083
>> sourcing.sourceSHA256
>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
>> triage.datatype
>> excel
>> uuid
>> d72ec2e9-cfbd-435e-9954-4f7fae55c550
>>
>> Thanks for any help. Perhaps my data is there but I simply can't render
>> it in the Viewer?
>> Jim
>>
>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt  wrote:
>>
>>> Jim,
>>>
>>> If a content type attribute exists and is not a type NiFi understands it
>>> will not be able to render it.  Can you show what flowfile attributes are
>>> present at the point you attempt to view it?
>>>
>>> Thanks
>>>
>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon 
>>> wrote:
>>>
>>>> Hello. I have converted incoming Excel files to csv. I'd like to look
>>>> at the result, but when I select my flowfiles from the output queue, I can
>>>> only select "View as hex" - but I cannot get the display to show me the
>>>> records in the form I expect. Viewing them using the hex display is not
>>>> helpful.
>>>>
>>>> How can I fix this viewing issue?
>>>>
>>>> Here is an example of what I can see:
>>>>
>>>> 0x 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table 1.
>>>> Estim
>>>> 0x0010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated
>>>> Monthly Sal
>>>> 0x0020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es and
>>>> Inventori
>>>> 0x0030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es for
>>>> Manufactu
>>>> 0x0040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers,
>>>> Retailers,
>>>> 0x0050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and
>>>> Merchant Wh
>>>> 0x0060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D
>>>> olesalers",Unnam
>>>> 0x0070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed:
>>>> 1,Unnamed: 2
>>>> 0x0080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed:
>>>> 3,Unna
>>>> 0x0090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med:
>>>> 4,Unnamed:
>>>> 0x00A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed:
>>>> 6,Unn
>>>> 0x00B0 61 6D 65 64 3A 20 37 2C 55 6E 6E 61 6D 65 

Re: How can I View my flowfile records?

2023-09-24 Thread James McMahon
I sure can Joe. Here they are:

RouteOnAttribute.Route
isExcel
execution.command
/usr/bin/python3
execution.command.args
/opt/nifi/config_resources/scripts/excelToCSV.py
execution.error
Empty string set
execution.status
0
filename
Alltables.csv
hash.value.md5
b48840c161b645a0169e622dcb8f5083
hash.value.sha256
4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
isChild
false
mime.extension
.xlsx
mime.type
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
parent.MD5
b48840c161b645a0169e622dcb8f5083
parent.SHA256
4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
path
./
s3.bucket
rampart-raw-data
s3.encryptionStrategy
SSE_S3
s3.etag
b48840c161b645a0169e622dcb8f5083
s3.isLatest
true
s3.lastModified
1672701227000
s3.length
830934
s3.owner
b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1
s3.sseAlgorithm
AES256
s3.storeClass
STANDARD
s3.version
null
sourcing.MD5
b48840c161b645a0169e622dcb8f5083
sourcing.SHA256
4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
sourcing.sourceMD5
b48840c161b645a0169e622dcb8f5083
sourcing.sourceSHA256
4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa
triage.datatype
excel
uuid
d72ec2e9-cfbd-435e-9954-4f7fae55c550

Thanks for any help. Perhaps my data is there but I simply can't render it
in the Viewer?
Jim

On Sun, Sep 24, 2023 at 6:08 PM Joe Witt  wrote:

> Jim,
>
> If a content type attribute exists and is not a type NiFi understands it
> will not be able to render it.  Can you show what flowfile attributes are
> present at the point you attempt to view it?
>
> Thanks
>
> On Sun, Sep 24, 2023 at 3:03 PM James McMahon 
> wrote:
>
>> Hello. I have converted incoming Excel files to csv. I'd like to look at
>> the result, but when I select my flowfiles from the output queue, I can
>> only select "View as hex" - but I cannot get the display to show me the
>> records in the form I expect. Viewing them using the hex display is not
>> helpful.
>>
>> How can I fix this viewing issue?
>>
>> Here is an example of what I can see:
>>
>> 0x 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table 1.
>> Estim
>> 0x0010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated Monthly
>> Sal
>> 0x0020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es and
>> Inventori
>> 0x0030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es for
>> Manufactu
>> 0x0040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers,
>> Retailers,
>> 0x0050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and Merchant
>> Wh
>> 0x0060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D
>> olesalers",Unnam
>> 0x0070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed:
>> 1,Unnamed: 2
>> 0x0080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed:
>> 3,Unna
>> 0x0090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med:
>> 4,Unnamed:
>> 0x00A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed:
>> 6,Unn
>> 0x00B0 61 6D 65 64 3A 20 37 2C 55 6E 6E 61 6D 65 64 3A amed:
>> 7,Unnamed:
>> 0x00C0 20 38 2C 55 6E 6E 61 6D 65 64 3A 20 39 2C 55 6E 8,Unnamed:
>> 9,Un
>> 0x00D0 6E 61 6D 65 64 3A 20 31 30 2C 55 6E 6E 61 6D 65 named:
>> 10,Unname
>> 0x00E0 64 3A 20 31 31 2C 55 6E 6E 61 6D 65 64 3A 20 31 d:
>> 11,Unnamed: 1
>> 0x00F0 32 0A 28 49 6E 20 6D 69 6C 6C 69 6F 6E 73 20 6F 2.(In
>> millions o
>> 0x0100 66 20 64 6F 6C 6C 61 72 73 29 2C 2C 2C 2C 2C 2C f
>> dollars),,
>> 0x0110 2C 2C 2C 2C 2C 2C 0A 2C 2C 2C 2C 2C 2C 2C 2C 2C
>> ,,.,
>> 0x0120 2C 2C 2C 0A 2C 53 61 6C 65 73 2C 30 2C 30 2C 49
>> ,,,.,Sales,0,0,I
>> 0x0130 6E 76 65 6E 74 6F 72 69 65 73 2C 30 2C 30 2C 49
>> nventories,0,0,I
>> 0x0140 6E 76 65 6E 74 6F 72 69 65 73 2F 53 61 6C 65 73
>> nventories/Sales
>> 0x0150 20 52 61 74 69 6F 73 2C 30 2C 30 2C 2C 2C 0A 2C
>> Ratios,0,0,,,.,
>> 0x0160 4F 63 74 2E 20 32 30 32 32 2C 53 65 70 2E 20 32 Oct.
>> 2022,Sep. 2
>> 0x0170 30 32 32 2C 4F 63 74 2E 20 32 30 32 31 2C 4F 63 022,Oct.
>> 2021,Oc
>> 0x0180 74 2E 20 32 30 32 32 2C 53 65 70 2E 20 32 30 32 t. 2022,Sep.
>> 202
>> 0x0190 32 2C 4F 63 74 2E 20 32 30 32 31 2C 4F 63 74 2E 2,Oct.
>> 2021,Oct.
>> 0x01A0 20 32 30 32 32 2C 53 65 70 2E 20 32 30 32 32 2C 2022,Sep.
>> 2022,
>> 0x01B0 4F 63 74 2E 20 32 30 32 31 2C 2C 2C 0A 2C 28 70 Oct.
>> 2021,,,.,(p
>> 0x01C0 29 2C 28 72 29 2C 28 72 29 2C 28 70 29 2C 28 72
>> ),(r),(r),(p),(r
>> 0x01D0 29 2C 28 72 29 2C 28 70 29 2C 28 72 29 2C 28 72
>> )

How can I View my flowfile records?

2023-09-24 Thread James McMahon
Hello. I have converted incoming Excel files to csv. I'd like to look at
the result, but when I select my flowfiles from the output queue, I can
only select "View as hex" - but I cannot get the display to show me the
records in the form I expect. Viewing them using the hex display is not
helpful.

How can I fix this viewing issue?

Here is an example of what I can see:

0x 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table 1. Estim
0x0010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated Monthly Sal
0x0020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es and Inventori
0x0030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es for Manufactu
0x0040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers, Retailers,
0x0050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and Merchant Wh
0x0060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D olesalers",Unnam
0x0070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed: 1,Unnamed: 2
0x0080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed: 3,Unna
0x0090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med: 4,Unnamed:
0x00A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed: 6,Unn
0x00B0 61 6D 65 64 3A 20 37 2C 55 6E 6E 61 6D 65 64 3A amed: 7,Unnamed:
0x00C0 20 38 2C 55 6E 6E 61 6D 65 64 3A 20 39 2C 55 6E 8,Unnamed: 9,Un
0x00D0 6E 61 6D 65 64 3A 20 31 30 2C 55 6E 6E 61 6D 65 named: 10,Unname
0x00E0 64 3A 20 31 31 2C 55 6E 6E 61 6D 65 64 3A 20 31 d: 11,Unnamed: 1
0x00F0 32 0A 28 49 6E 20 6D 69 6C 6C 69 6F 6E 73 20 6F 2.(In millions o
0x0100 66 20 64 6F 6C 6C 61 72 73 29 2C 2C 2C 2C 2C 2C f dollars),,
0x0110 2C 2C 2C 2C 2C 2C 0A 2C 2C 2C 2C 2C 2C 2C 2C 2C ,,.,
0x0120 2C 2C 2C 0A 2C 53 61 6C 65 73 2C 30 2C 30 2C 49 ,,,.,Sales,0,0,I
0x0130 6E 76 65 6E 74 6F 72 69 65 73 2C 30 2C 30 2C 49 nventories,0,0,I
0x0140 6E 76 65 6E 74 6F 72 69 65 73 2F 53 61 6C 65 73 nventories/Sales
0x0150 20 52 61 74 69 6F 73 2C 30 2C 30 2C 2C 2C 0A 2C Ratios,0,0,,,.,
0x0160 4F 63 74 2E 20 32 30 32 32 2C 53 65 70 2E 20 32 Oct. 2022,Sep. 2
0x0170 30 32 32 2C 4F 63 74 2E 20 32 30 32 31 2C 4F 63 022,Oct. 2021,Oc
0x0180 74 2E 20 32 30 32 32 2C 53 65 70 2E 20 32 30 32 t. 2022,Sep. 202
0x0190 32 2C 4F 63 74 2E 20 32 30 32 31 2C 4F 63 74 2E 2,Oct. 2021,Oct.
0x01A0 20 32 30 32 32 2C 53 65 70 2E 20 32 30 32 32 2C 2022,Sep. 2022,
0x01B0 4F 63 74 2E 20 32 30 32 31 2C 2C 2C 0A 2C 28 70 Oct. 2021,,,.,(p
0x01C0 29 2C 28 72 29 2C 28 72 29 2C 28 70 29 2C 28 72 ),(r),(r),(p),(r
0x01D0 29 2C 28 72 29 2C 28 70 29 2C 28 72 29 2C 28 72 ),(r),(p),(r),(r
0x01E0 29 2C 2C 2C 0A 20 41 64 6A 75 73 74 65 64 31 2C ),,,. Adjusted1,
0x01F0 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 2C 0A 20 20 20 20 ,,,.
0x0200 20 20 20 20 54 6F 74 61 6C 20 62 75 73 69 6E 65 Total busine
0x0210 73 73 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 ss..
0x0220 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 
0x0230 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 
0x0240 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 A6 E2 80 
0x0250 A6 2C 31 38 35 39 34 38 36 2C 31 38 34 35 32 32 .,1859486,1and
it continues like this.

Thanks in advance for your help.

Jim.


SQL query to count fields with values.

2023-09-07 Thread James McMahon
I have converted incoming Excel files to csv files. My incoming files vary
in structure, and so do have different headers and also different
field names. I have the header field names in an attribute.

How can I employ a QueryRecord or an ExecuteSQL processor to return a count
for each field of records that have values in them (non-whitespace,
non-empty, non-null values)? They vary by flowfile, so the field names
would have to be read from the attribute into the SQL query. I need my
query to be dynamic to the field structure, and cannot fix them by name
because they change depending on the raw data.

Using Query Record seems to depend on a Record Reader and on a Record
Writer. Is it possible to configure these dynamically if every flowfile
will have variations in field names and in number of fields in its header?


Re: BufferedReader best option to search through large flowfiles?

2023-06-05 Thread James McMahon
Thank you very much Mark and Lars. Ideally I do prefer to employ standard
"out of the box" processors. In this case my requirement is to identify
bounding dates across all content in the flowfile. As I match my DT
patterns, I'll add the tokens to a groovy list that I can later sort and
use to identify the extreme values. (I may actually throw out the extremes
to ensure I'm not working with an outlier that is an error). I know how to
make those manipulations in a groovy script. I don't know how to accomplish
them using standard processors.

Mark, for future reference is there a risk when using RouteText that a huge
flowfile might exhaust jvm or repo resources? Is there such a risk for the
ExtractText, ReplaceText, and RouteOnContent processors mentioned by Lars?

Jim

On Mon, Jun 5, 2023 at 8:25 AM Mark Payne  wrote:

> Jim,
>
> Take a look at RouteText.
>
> Thanks
> -Mark
>
>
> > On Jun 5, 2023, at 8:09 AM, James McMahon  wrote:
> >
> > Hello. I have a requirement to scan for multiple regex patterns in very
> large flowfiles. Given that my flowfiles can be very large, I think my best
> approach is to employ an ExecuteGroovyScript processor and a script using a
> BufferedReader to scan the file one line at a time.
> >
> > I am concerned that I might exhaust jvm resources trying to otherwise
> process large content if I try to handle it all at once. Is a
> BufferedReader the right call? Does anyone recommend a better approach?
> >
> > Thanks in advance,
> > Jim
>
>


Re: Adjusting FlattenJson output

2023-06-01 Thread James McMahon
Very interesting re your discovery. Thanks very much for sharing that Matt.
I do want to keep the schema intact. I will try ScriptedTransformRecord.
Thanks again for replying Matt.
Cheers,
Jim

On Thu, Jun 1, 2023 at 10:13 AM Matt Burgess  wrote:

> Jim,
>
> I tried to use Jolt for this but I found in the doc that if you try to
> set an empty array or map to null or the empty string it will retain
> the empty array or map (no idea why). Since you know the name of the
> fields (and I assume want to keep the schema intact) you can use
> ScriptedTransformRecord to get at those fields by name and set them to
> null.
>
> Regards,
> Matt
>
> On Mon, May 29, 2023 at 10:38 AM James McMahon 
> wrote:
> >
> > I have incoming JSON data that begins like this, and that I am trying to
> flatten with FlattenJSON v1.16.3:
> >
> > {
> >   "meta" : {
> > "view" : {
> >   "id" : "kku6-nxdu",
> >   "name" : "Demographic Statistics By Zip Code",
> >   "assetType" : "dataset",
> >   "attribution" : "Department of Youth and Community Development
> (DYCD)",
> >   "averageRating" : 0,
> >   "category" : "City Government",
> >   "createdAt" : 1311775554,
> >   "description" : "Demographic statistics broken down by zip code",
> >   "displayType" : "table",
> >   "downloadCount" : 1017553,
> >   "hideFromCatalog" : false,
> >   "hideFromDataJson" : false,
> >   "indexUpdatedAt" : 1536596131,
> >   "newBackend" : true,
> >   "numberOfComments" : 3,
> >   "oid" : 4208790,
> >   "provenance" : "official",
> >   "publicationAppendEnabled" : false,
> >   "publicationDate" : 1372266760,
> >   "publicationGroup" : 238846,
> >   "publicationStage" : "published",
> >   "rowClass" : "",
> >   "rowsUpdatedAt" : 1372266747,
> >   "rowsUpdatedBy" : "uurm-7z6x",
> >   "tableId" : 942474,
> >   "totalTimesRated" : 0,
> >   "viewCount" : 70554,
> >   "viewLastModified" : 1652135219,
> >   "viewType" : "tabular",
> >   "approvals" : [ {
> > "reviewedAt" : 1372266760,
> > "reviewedAutomatically" : true,
> > "state" : "approved",
> > "submissionId" : 1064760,
> > "submissionObject" : "public_audience_request",
> > "submissionOutcome" : "change_audience",
> > "submittedAt" : 1372266760,
> > "workflowId" : 2285,
> > "submissionDetails" : {
> >   "permissionType" : "READ"
> > },
> > "submissionOutcomeApplication" : {
> >   "failureCount" : 0,
> >   "status" : "success"
> > },
> > "submitter" : {
> >   "id" : "5fuc-pqz2",
> >   "displayName" : "NYC OpenData"
> > }
> >   } ],
> >   "clientContext" : {
> > "clientContextVariables" : [ ],
> > "inheritedVariables" : { }
> >   },
> >   "columns" : [ {
> > "id" : -1,
> > "name" : "sid",
> > "dataTypeName" : "meta_data",
> > "fieldName" : ":sid",
> > "position" : 0,
> > "renderTypeName" : "meta_data",
> > "format" : { },
> > "flags" : [ "hidden" ]
> >   }, { .
> >
> > This is my configuration of my FlattenJson processor:
> > Separator  .
> > Flatten modenormal
> > Ignore Reserved Charactersfalse
> > Return Type  flatten
> > Character Set   UTF-8
> > Pretty Print JSON true
> >
> > Those lines in red appear in my output like this:
> >   "meta.view.clientContext.clientContextVariables" : [  ],
> >   "meta.view.clientContext.inheritedVariables" : {
> >
> >   },
> >
> > I don't want to preserve the empty list and empty map. I want to set the
> values for these keys to the empty string or null is acceptable.
> >
> > Can I do this natively through the FlattenJson configuration? If not,
> what would be the most efficient means to post-process to what I seek in my
> flow?
> >
> > Thanks in advance for any help.
>


Adjusting FlattenJson output

2023-05-29 Thread James McMahon
I have incoming JSON data that begins like this, and that I am trying to
flatten with FlattenJSON v1.16.3:

{
  "meta" : {
"view" : {
  "id" : "kku6-nxdu",
  "name" : "Demographic Statistics By Zip Code",
  "assetType" : "dataset",
  "attribution" : "Department of Youth and Community Development
(DYCD)",
  "averageRating" : 0,
  "category" : "City Government",
  "createdAt" : 1311775554,
  "description" : "Demographic statistics broken down by zip code",
  "displayType" : "table",
  "downloadCount" : 1017553,
  "hideFromCatalog" : false,
  "hideFromDataJson" : false,
  "indexUpdatedAt" : 1536596131,
  "newBackend" : true,
  "numberOfComments" : 3,
  "oid" : 4208790,
  "provenance" : "official",
  "publicationAppendEnabled" : false,
  "publicationDate" : 1372266760,
  "publicationGroup" : 238846,
  "publicationStage" : "published",
  "rowClass" : "",
  "rowsUpdatedAt" : 1372266747,
  "rowsUpdatedBy" : "uurm-7z6x",
  "tableId" : 942474,
  "totalTimesRated" : 0,
  "viewCount" : 70554,
  "viewLastModified" : 1652135219,
  "viewType" : "tabular",
  "approvals" : [ {
"reviewedAt" : 1372266760,
"reviewedAutomatically" : true,
"state" : "approved",
"submissionId" : 1064760,
"submissionObject" : "public_audience_request",
"submissionOutcome" : "change_audience",
"submittedAt" : 1372266760,
"workflowId" : 2285,
"submissionDetails" : {
  "permissionType" : "READ"
},
"submissionOutcomeApplication" : {
  "failureCount" : 0,
  "status" : "success"
},
"submitter" : {
  "id" : "5fuc-pqz2",
  "displayName" : "NYC OpenData"
}
  } ],



* "clientContext" : {"clientContextVariables" : [ ],
"inheritedVariables" : { }  },*
  "columns" : [ {
"id" : -1,
"name" : "sid",
"dataTypeName" : "meta_data",
"fieldName" : ":sid",
"position" : 0,
"renderTypeName" : "meta_data",
"format" : { },
"flags" : [ "hidden" ]
  }, { .

This is my configuration of my FlattenJson processor:
Separator  .
Flatten modenormal
Ignore Reserved Charactersfalse
Return Type  flatten
Character Set   UTF-8
Pretty Print JSON true

Those lines in red appear in my output like this:
  "meta.view.clientContext.clientContextVariables" : [  ],
  "meta.view.clientContext.inheritedVariables" : {

  },

I don't want to preserve the empty list and empty map. I want to set the
values for these keys to the empty string or null is acceptable.

Can I do this natively through the FlattenJson configuration? If not, what
would be the most efficient means to post-process to what I seek in my flow?

Thanks in advance for any help.


How to determine groovy version

2023-05-13 Thread James McMahon
I'm using a series of Groovy scripts running from NiFi v1.16.3
ExecuteScript. How can I determine which version of Groovy is baked into
NiFi v1.16.3?


Re: Generalizing QueryRecord to changing inferred CSV headers

2023-04-23 Thread James McMahon
I did as you suggested Matt and attacked the problem with Groovy. Since I'm
stronger at manipulating json than csv using Groovy, I first converted the
incoming to json using a ConverRecord processor. I inferred the record
schema from the header assumed in the incoming csv. (still tbd: how I will
handle csv files that lack any header).

If anyone has a similar requirement in the future here is some Groovy code
that gets the job done. It could probably use some tidying up:

import groovy.json.JsonSlurper
import groovy.json.JsonOutput
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets

def tallyMap = [:]
def topValuesMap = [:]
def lineCount = 0

def ff = session.get()
if (!ff) return
try {
session.read(ff, { inputStream ->
def json = new
JsonSlurper().parseText(IOUtils.toString(inputStream,
StandardCharsets.UTF_8))

json.each { jsonObj ->
jsonObj.each { key, value ->
if (value != null && !value.toString().trim().isEmpty()) {
tallyMap[key] = tallyMap.containsKey(key) ?
tallyMap[key] + 1 : 1
if (topValuesMap.containsKey(key)) {
def valuesMap = topValuesMap[key]
valuesMap[value] = valuesMap.containsKey(value) ?
valuesMap[value] + 1 : 1
topValuesMap[key] = valuesMap
} else {
topValuesMap[key] = [:].withDefault{ 0
}.plus([value: 1])
}
// Remove the "value: 1" entry from the topValuesMap
topValuesMap[key].remove("value")
}
}
}

// Sort the topValuesMap for each key based on the frequency of
values
topValuesMap.each { key, valuesMap ->
topValuesMap[key] = valuesMap.sort{ -it.value }.take(10)
}


// Count the number of JSON records
lineCount += json.size()
} as InputStreamCallback)

//def triageResultsString =
JsonOutput.prettyPrint(JsonOutput.toJson(triageResultsMap))

def tallyMapString = JsonOutput.prettyPrint(JsonOutput.toJson(tallyMap))
def topValuesMapString =
JsonOutput.prettyPrint(JsonOutput.toJson(topValuesMap))

ff = session.putAttribute(ff, 'triage.csv.tallyMap', tallyMapString)
ff = session.putAttribute(ff, 'triage.csv.topValuesMap',
topValuesMapString)
ff = session.putAttribute(ff, 'triage.csv.lineCount',
lineCount.toString())
session.transfer(ff, REL_SUCCESS)

} catch (Exception e) {
log.error('Error processing csv fields', e)
session.transfer(ff, REL_FAILURE)
}

Incoming CSV looks like this (sample csv data from data.gov):

*Bank Name ,City ,State ,Cert ,Acquiring Institution ,Closing Date ,Fund*
Almena State Bank,Almena,KS,15426,Equity Bank,23-Oct-20,10538
First City Bank of Florida,Fort Walton Beach,FL,16748,"United Fidelity
Bank, fsb",16-Oct-20,10537
The First State Bank,Barboursville,WV,14361,"MVB Bank, Inc.",3-Apr-20,10536
Ericson State Bank,Ericson,NE,18265,Farmers and Merchants
Bank,14-Feb-20,10535
City National Bank of New Jersey,Newark,NJ,2,Industrial
Bank,1-Nov-19,10534
...

Here is the field analysis output that results:
triage.csv.lineCount
563
triage.csv.tallyMap
{ "Bank Name\ufffd": 563, "City\ufffd": 563, "State\ufffd": 563,
"Cert\ufffd": 563, "Acquiring Institution\ufffd": 563, "Closing
Date\ufffd": 563, "Fund": 563 }
triage.csv.topValuesMap
{ "Bank Name\ufffd": { "The First State Bank": 3, "Premier Bank": 3, "First
State Bank": 3, "Horizon Bank": 3, "Valley Bank": 2, "Frontier Bank": 2,
"Summit Bank": 2, "The Park Avenue Bank": 2, "Legacy Bank": 2, "First
National Bank": 2 }, "City\ufffd": { "Chicago": 20, "Atlanta": 10,
"Phoenix": 6, "Naples": 5, "Scottsdale": 4, "Las Vegas": 4, "Bradenton": 4,
"Miami": 4, "Los Angeles": 4, "Alpharetta": 4 }, "State\ufffd": { "GA": 93,
"FL": 76, "IL": 69, "CA": 41, "MN": 23, "WA": 19, "AZ": 16, "MO": 16, "MI":
14, "TX": 13 }, "Cert\ufffd": { "16748": 1, "14361": 1, "18265": 1,
"2": 1, "58317": 1, "58112": 1, "10716": 1, "30570": 1, "17719": 1,
"1802": 1 }, "Acquiring Institution\ufffd": { "No Acquirer": 31, "State
Bank and Trust Company": 12, "First-Citizens Bank & Trust Company": 11,
"Ameris Bank": 10, "U.S. Bank N.A.": 9, "Community & Southern Bank": 8,
"Centennial Bank": 7, &q

Re: Generalizing QueryRecord to changing inferred CSV headers

2023-04-18 Thread James McMahon
Thanks very much for your reply, Matt. Yes sir, a Groovy script is my
fallback option. Because we would rather build flows using "out of the NiFi
box" processors instead of custom scripts that may need to be maintained, I
was saving that as my last resort. But I do believe I can do it with Groovy.

I have a sample CSV data set of bank info I grabbed from data.gov. I've
successfully used ConvertRecord to convert from csv to json (Groovy maps
work really well with json). Using Groovy, I've identified my fields:

Bank Name�,City�,Closing Date�,Fund,Acquiring Institution�,Cert�,State�
 (That weird character at end of each field are in the set. Not sure why.)
Here are a few initial records in my massaged result:
[ {
  "Bank Name�" : "Almena State Bank",
  "City�" : "Almena",
  "State�" : "KS",
  "Cert�" : "15426",
  "Acquiring Institution�" : "Equity Bank",
  "Closing Date�" : "23-Oct-20",
  "Fund" : "10538"
}, {
  "Bank Name�" : "First City Bank of Florida",
  "City�" : "Fort Walton Beach",
  "State�" : "FL",
  "Cert�" : "16748",
  "Acquiring Institution�" : "United Fidelity Bank, fsb",
  "Closing Date�" : "16-Oct-20",
  "Fund" : "10537"
},.
I am going to try iterating through that list of fields using Groovy,
keeping a map with field name as its key and a value that is a map of
"field value": "count", sorted. I can then extract all sorts of valuable
metadata.
In any case thank you for the reply. Guess I'll go the custom Groovy script
route.
Jim

On Tue, Apr 18, 2023 at 6:17 PM Matt Burgess  wrote:

> Jim,
>
> QueryRecord uses Apache Calcite under the hood and is thus at the
> mercy of the SQL standard (and any additional rules/dialect from
> Apache Calcite) so in general you can't select "all except X" or "all
> except change X to Y". Does it need to be SQL executed against the
> individual fields? If not, take a look at ScriptedTransformRecord doc
> (and its Additional Details page). IIRC you're a Groovy guy now ;) so
> you should be able to alter the fields as you see fit using Groovy
> rather than SQL (alternatively Jython as you've done a bunch of that
> as well).
>
> Regards,
> Matt
>
> On Tue, Apr 18, 2023 at 6:04 PM James McMahon 
> wrote:
> >
> > Hello. I recently asked the community a question about processing CSV
> files. I received some helpful advice about using processors such as
> ConvertRecord and QueryRecord, and was encouraged to employ Readers and
> RecordSetWriters. I've done that, and thank all who replied.
> >
> > My incoming CSV files come in with different headers because they are
> widely different data sets. The header structure is not known in advance.
> As such, I configure a QueryRecord processor with a CSVReader that employs
> a Schema Access Strategy that is Use String Fields From Header. I configure
> a CSVRecordSetWriter that sets Infer Record Schema as its Schema Access
> Strategy.
> >
> > Now I want to use that QueryRecord processor to characterize the various
> fields using SQL. Record counts, min and max values - things of that
> nature. But in all the examples I find in YouTube and in the open source,
> the authors presume a knowledge of the fields in advance. For example
> Property year is set by Value select "year" from FLOWFILE.
> >
> > We simply don't have that luxury, that awareness in advance. After all,
> that's the very reason we inferred the schema in the reader and writer
> configuration. The fields are more often than not going to be very
> different. Hard wiring them into QueryRecord is not a flow solution that is
> flexible enough. We need to grab them from the inferred schema the Reader
> and Writer services identified.
> >
> > What syntax or notation can we use in the QueryRecord sql to say "for
> each field found in the header, execute this sql against that field"? I
> guess what I'm looking for is iteration through all the inferred schema
> fields, and dynamic assignment of the field name in the SQL.
> >
> > Has anyone faced this same challenge? How did you solve it?
> > Is there another way to approach this problem?
> >
> > Thank you in advance,
> > Jim
>


Generalizing QueryRecord to changing inferred CSV headers

2023-04-18 Thread James McMahon
Hello. I recently asked the community a question about processing CSV
files. I received some helpful advice about using processors such as
ConvertRecord and QueryRecord, and was encouraged to employ Readers and
RecordSetWriters. I've done that, and thank all who replied.

My incoming CSV files come in with different headers because they are
widely different data sets. The header structure is not known in advance.
As such, I configure a QueryRecord processor with a CSVReader that employs
a Schema Access Strategy that is *Use String Fields From Header*. I
configure a CSVRecordSetWriter that sets *Infer Record Schema* as its
Schema Access Strategy.

Now I want to use that QueryRecord processor to characterize the various
fields using SQL. Record counts, min and max values - things of that
nature. But in all the examples I find in YouTube and in the open source,
the authors presume a knowledge of the fields in advance. For example
Property year is set by Value select "year" from FLOWFILE.

We simply don't have that luxury, that awareness in advance. After all,
that's the very reason we inferred the schema in the reader and writer
configuration. The fields are more often than not going to be very
different. Hard wiring them into QueryRecord is not a flow solution that is
flexible enough. We need to grab them from the inferred schema the Reader
and Writer services identified.

What syntax or notation can we use in the QueryRecord sql to say "for each
field found in the header, execute this sql against that field"? I guess
what I'm looking for is iteration through all the inferred schema fields,
and dynamic assignment of the field name in the SQL.

Has anyone faced this same challenge? How did you solve it?
Is there another way to approach this problem?

Thank you in advance,
Jim


Re: Handling CSVs dynamically with NiFi

2023-04-12 Thread James McMahon
Very cool. That sounds very promising. Thank you again Isha.

On Wed, Apr 12, 2023 at 9:23 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> I’ve overlooked an even better option: PartitionRecord.
>
>
>
> This allows you to add custom properties representing fields in the data.
> The processor will then put records into flowfiles that contain the same
> combination of your properties. So if you have say “City” with a recordpath
> of /sale/store/city it should split the data and spit out a file for each
> city value found.
>
>
>
> Depending on your N outputs, you may be able to do this in one go or chain
> a few Processors (say “state”, then “city” then “zipcode”).
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* woensdag 12 april 2023 14:56
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you very much Isha. This is helpful. Assuming I wanted to route to N
> different output paths, does it follow that I need to use N different Query
> Record processors tailored to filter for just one subset?
>
> I'll have to experiment with it to develop more of a feel for how it can
> be used.
>
> Thanks again for taking a moment to reply with the suggestion.
>
> Jim
>
>
>
> On Wed, Apr 12, 2023 at 6:54 AM Isha Lamboo <
> isha.lam...@virtualsciences.nl> wrote:
>
> Hi James,
>
>
>
> One option you can use is the QueryRecord processor. It allows you to
> filter records with a SQL-like query for any combination of fields that
> your downstream tools require. You can add one for each different output
> required and send a copy of the main json file to each.
>
>
>
> This approach should work better if you have a limited number of different
> output files with many records each. If your goal is hundreds of different
> json files with a handful of records each, then splitting per row might be
> quicker than copying the entire json file that many times.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* vrijdag 7 april 2023 17:14
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Hello Bryan. Thank you for your question.
>
> A downstream consumer requires the complete set in json. So that's part of
> why I convert.
>
> Other downstream tools require json input, but not the entire set. The
> data needs to be routed based on certain features. Geographic location.
> Sales data by zip codes. Etc, etc. Splitting the records out seemed to be a
> reasonable option to route individual records.
>
> I appreciate you taking the time to ask. You are far more familiar with
> nifi best practices than me. If there is a better way than what I intended,
> please fire away. I'd love to mArch down a better path if there is one.
>
> Cheers,
>
> Jim
>
>
>
> On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:
>
> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire dataset or across multiple datasets
> (to

Re: Handling CSVs dynamically with NiFi

2023-04-12 Thread James McMahon
Thank you very much Isha. This is helpful. Assuming I wanted to route to N
different output paths, does it follow that I need to use N different Query
Record processors tailored to filter for just one subset?
I'll have to experiment with it to develop more of a feel for how it can be
used.
Thanks again for taking a moment to reply with the suggestion.
Jim

On Wed, Apr 12, 2023 at 6:54 AM Isha Lamboo 
wrote:

> Hi James,
>
>
>
> One option you can use is the QueryRecord processor. It allows you to
> filter records with a SQL-like query for any combination of fields that
> your downstream tools require. You can add one for each different output
> required and send a copy of the main json file to each.
>
>
>
> This approach should work better if you have a limited number of different
> output files with many records each. If your goal is hundreds of different
> json files with a handful of records each, then splitting per row might be
> quicker than copying the entire json file that many times.
>
>
>
> Regards,
>
>
>
> Isha
>
>
>
> *Van:* James McMahon 
> *Verzonden:* vrijdag 7 april 2023 17:14
> *Aan:* users@nifi.apache.org
> *Onderwerp:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Hello Bryan. Thank you for your question.
>
> A downstream consumer requires the complete set in json. So that's part of
> why I convert.
>
> Other downstream tools require json input, but not the entire set. The
> data needs to be routed based on certain features. Geographic location.
> Sales data by zip codes. Etc, etc. Splitting the records out seemed to be a
> reasonable option to route individual records.
>
> I appreciate you taking the time to ask. You are far more familiar with
> nifi best practices than me. If there is a better way than what I intended,
> please fire away. I'd love to mArch down a better path if there is one.
>
> Cheers,
>
> Jim
>
>
>
> On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:
>
> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
> >>
> >>
> >>
> >> Mike Sofen
> >>
> >>
> >>
> >> From: James McMahon 
> >> Sent: Thursday, April 06, 2023 2:03 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: Handling CSVs dynamically with NiFi
> >>
> >>
> >>
> >> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
> >>
> >> When ConvertRecord is finished, my result looks like this sample:
> >>
> >> [ {
> >>   "Bank Name�" : "Almena State Bank",
> &

Re: Handling CSVs dynamically with NiFi

2023-04-07 Thread James McMahon
Hello Bryan. Thank you for your question.
A downstream consumer requires the complete set in json. So that's part of
why I convert.
Other downstream tools require json input, but not the entire set. The data
needs to be routed based on certain features. Geographic location. Sales
data by zip codes. Etc, etc. Splitting the records out seemed to be a
reasonable option to route individual records.
I appreciate you taking the time to ask. You are far more familiar with
nifi best practices than me. If there is a better way than what I intended,
please fire away. I'd love to mArch down a better path if there is one.
Cheers,
Jim

On Fri, Apr 7, 2023 at 10:57 AM Bryan Bende  wrote:

> James,
>
> I'm not sure what the end goal is, but why do you need to use
> EvaluateJsonPath and SplitJson?
>
> Generally you don't want to split a flow file of multiple records into
> 1 record per flow file, this is an anti-pattern that leads to poor
> performance in the flow.
>
> Thanks,
>
> Bryan
>
> On Fri, Apr 7, 2023 at 9:41 AM James McMahon  wrote:
> >
> > Very interesting, very helpful insights. Thank you again, Mike.
> > Late last night I decided to punt on a pure NiFi solution. I knew I
> could do this easily with Groovy scripting, and I knew that was well-within
> my wheelhouse. So that's what I did: Groovy from an ExecuteScript
> processor. I'm 90% of the way there. Just a few more refinements to get
> just what I want, which I'll tackle later tonight.
> > Groovy is pretty cool. Flexible, easily tailored to just what you need.
> I like having that flexibility. And I like having options, too: your
> results have motivated me to look at using QueryRecords, etc etc.
> > Jim
> >
> > On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:
> >>
> >> This is where I felt Nifi wasn’t the right tool for the job and
> Postgres was.  After I imported the CSV directly into a staging table in
> the database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
> >>
> >>
> >>
> >> THEN, using the amazingly powerful Postgres jsonb functions, I was able
> to extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
> >>
> >>
> >>
> >> Mike Sofen
> >>
> >>
> >>
> >> From: James McMahon 
> >> Sent: Thursday, April 06, 2023 2:03 PM
> >> To: users@nifi.apache.org
> >> Subject: Re: Handling CSVs dynamically with NiFi
> >>
> >>
> >>
> >> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
> >>
> >> When ConvertRecord is finished, my result looks like this sample:
> >>
> >> [ {
> >>   "Bank Name�" : "Almena State Bank",
> >>   "City�" : "Almena",
> >>   "State�" : "KS",
> >>   "Cert�" : "15426",
> >>   "Acquiring Institution�" : "Equity Bank",
> >>   "Closing Date�" : "23-Oct-20",
> >>   "Fund" : "10538"
> >> }, {
> >>   "Bank Name�" : "First City Bank of Florida",
> >>   "City�" : "Fort Walton Beach",
> >>   "State�" : "FL",
> >>   "Cert�" : "16748",
> >>   "Acquiring Institution�" : "United Fidelity Bank, fsb",
> >>   "Closing Date�" : "16-Oct-20",
> >>   "Fund" : "10537"
> >> }, {
> >>   "Bank Name�" : "The First State Bank",
> >>   "City�" : "Barboursville",
> >>   "State�" : "WV",
> >>   "Cert�" : "14361",
> >>   "Acquiring Institution�" : "MVB Bank, Inc.",
> >>   "Closing Date�" : "3-Apr-20",
> >>   "Fund" : "10536"
> >> }]
> >>
> >>
> >>
> >> I don't really have a schema. 

Re: Handling CSVs dynamically with NiFi

2023-04-07 Thread James McMahon
Very interesting, very helpful insights. Thank you again, Mike.
Late last night I decided to punt on a pure NiFi solution. I knew I could
do this easily with Groovy scripting, and I knew that was well-within my
wheelhouse. So that's what I did: Groovy from an ExecuteScript processor.
I'm 90% of the way there. Just a few more refinements to get just what I
want, which I'll tackle later tonight.
Groovy is pretty cool. Flexible, easily tailored to just what you need. I
like having that flexibility. And I like having options, too: your results
have motivated me to look at using QueryRecords, etc etc.
Jim

On Fri, Apr 7, 2023 at 9:32 AM Mike Sofen  wrote:

> This is where I felt Nifi wasn’t the right tool for the job and Postgres
> was.  After I imported the CSV directly into a staging table in the
> database (using Nifi), I converted the payload part of the columns into
> jsonb and stored that into the final table in a column with additional
> columns as relational data (timestamps, identifiers, etc).  It was an
> object-relational data model.
>
>
>
> THEN, using the amazingly powerful Postgres jsonb functions, I was able to
> extract the unique keys in an entire dataset or across multiple datasets
> (to build a data catalog for example), perform a wide range of validations
> on individual keys, etc.  I use the word amazing because they are not just
> powerful functions but they run surprisingly fast given the amount of
> string data they are traversing.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 2:03 PM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
> created a CsvReader service with Schema Access Strategy of Use String
> Fields From Header. I created a JsonRecordSetWriter service with Schema
> Write Strategy of Do Not Write Schema.
>
> When ConvertRecord is finished, my result looks like this sample:
>
> [ {
>   "Bank Name�" : "Almena State Bank",
>   "City�" : "Almena",
>   "State�" : "KS",
>   "Cert�" : "15426",
>   "Acquiring Institution�" : "Equity Bank",
>   "Closing Date�" : "23-Oct-20",
>   "Fund" : "10538"
> }, {
>   "Bank Name�" : "First City Bank of Florida",
>   "City�" : "Fort Walton Beach",
>   "State�" : "FL",
>   "Cert�" : "16748",
>   "Acquiring Institution�" : "United Fidelity Bank, fsb",
>   "Closing Date�" : "16-Oct-20",
>   "Fund" : "10537"
> }, {
>   "Bank Name�" : "The First State Bank",
>   "City�" : "Barboursville",
>   "State�" : "WV",
>   "Cert�" : "14361",
>   "Acquiring Institution�" : "MVB Bank, Inc.",
>   "Closing Date�" : "3-Apr-20",
>   "Fund" : "10536"
> }]
>
>
>
> I don't really have a schema. How can I use a combination of SplitJson and
> EvaluateJsonPath to split each json object out to its own nifi flowfile,
> and to pull the json key values out to define the fields in the csv header?
> I've found a few examples through research that allude to this, but they
> all seem to have a fixed schema and they don't offer configurations for the
> SplitJson. In a case where my json keys definition changes depending on the
> lfowfile, what should JsonPathExpression be set to in the SplitJson
> configuration?
>
>
>
> On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen  wrote:
>
> Jim – that’s exactly what I did on that “pre” step – generate a schema
> from the CSVReader and use that to dynamically create the DDL sql needed to
> build the staging table in Postgres.  In my solution, there are 2 separate
> pipelines running – this pre step and the normal file processing.
>
>
>
> I used the pre step to ensure that all incoming files were from a known
> and valid source and that they conformed to the schema for that source – a
> very tidy way to ensure data quality.
>
>
>
> Mike
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 6:39 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you both very much, Bryan and Mike. Mike, had you considered the
> approach mentioned by Bryan - a Reader processor to infer schema  -  and
> found it wasn't suitable for your use case, for some reason? For instance,
> perhaps you were employing a version of Apache NiFi that did not 

Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread James McMahon
Can I ask you one follow-up? I've gotten my ConvertRecord to work. I
created a CsvReader service with Schema Access Strategy of Use String
Fields From Header. I created a JsonRecordSetWriter service with Schema
Write Strategy of Do Not Write Schema.
When ConvertRecord is finished, my result looks like this sample:

[ {
  "Bank Name�" : "Almena State Bank",
  "City�" : "Almena",
  "State�" : "KS",
  "Cert�" : "15426",
  "Acquiring Institution�" : "Equity Bank",
  "Closing Date�" : "23-Oct-20",
  "Fund" : "10538"
}, {
  "Bank Name�" : "First City Bank of Florida",
  "City�" : "Fort Walton Beach",
  "State�" : "FL",
  "Cert�" : "16748",
  "Acquiring Institution�" : "United Fidelity Bank, fsb",
  "Closing Date�" : "16-Oct-20",
  "Fund" : "10537"
}, {
  "Bank Name�" : "The First State Bank",
  "City�" : "Barboursville",
  "State�" : "WV",
  "Cert�" : "14361",
  "Acquiring Institution�" : "MVB Bank, Inc.",
  "Closing Date�" : "3-Apr-20",
  "Fund" : "10536"
}]

I don't really have a schema. How can I use a combination of SplitJson and
EvaluateJsonPath to split each json object out to its own nifi flowfile,
and to pull the json key values out to define the fields in the csv header?
I've found a few examples through research that allude to this, but they
all seem to have a fixed schema and they don't offer configurations for the
SplitJson. In a case where my json keys definition changes depending on the
lfowfile, what should JsonPathExpression be set to in the SplitJson
configuration?

On Thu, Apr 6, 2023 at 9:59 AM Mike Sofen  wrote:

> Jim – that’s exactly what I did on that “pre” step – generate a schema
> from the CSVReader and use that to dynamically create the DDL sql needed to
> build the staging table in Postgres.  In my solution, there are 2 separate
> pipelines running – this pre step and the normal file processing.
>
>
>
> I used the pre step to ensure that all incoming files were from a known
> and valid source and that they conformed to the schema for that source – a
> very tidy way to ensure data quality.
>
>
>
> Mike
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 6:39 AM
> *To:* users@nifi.apache.org
> *Subject:* Re: Handling CSVs dynamically with NiFi
>
>
>
> Thank you both very much, Bryan and Mike. Mike, had you considered the
> approach mentioned by Bryan - a Reader processor to infer schema  -  and
> found it wasn't suitable for your use case, for some reason? For instance,
> perhaps you were employing a version of Apache NiFi that did not afford
> access to a CsvReader or InferAvroSchema processor?
>
> Jim
>
>
>
> On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen  wrote:
>
> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users 
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>
>


Re: Handling CSVs dynamically with NiFi

2023-04-06 Thread James McMahon
Thank you both very much, Bryan and Mike. Mike, had you considered the
approach mentioned by Bryan - a Reader processor to infer schema  -  and
found it wasn't suitable for your use case, for some reason? For instance,
perhaps you were employing a version of Apache NiFi that did not afford
access to a CsvReader or InferAvroSchema processor?
Jim

On Thu, Apr 6, 2023 at 9:30 AM Mike Sofen  wrote:

> Hi James,
>
>
>
> I don’t have time to go into details, but I had nearly the same scenario
> and solved it by using Nifi as the file processing piece only, sending
> valid CSV files (valid as in CSV formatting) and leveraged Postgres to land
> the CSV data into pre-built staging tables and from there did content
> validations and packaging into jsonb for storage into a single target
> table.
>
>
>
> In my case, an external file source had to “register” a single file (to
> allow creating the matching staging table) prior to sending data.  I used
> Nifi for that pre-staging step to derive the schema for the staging table
> for a file and I used a complex stored procedure to handle a massive amount
> of logic around the contents of a file when processing the actual files
> prior to storing into the destination table.
>
>
>
> Nifi was VERY fast and efficient in this, as was Postgres.
>
>
>
> Mike Sofen
>
>
>
> *From:* James McMahon 
> *Sent:* Thursday, April 06, 2023 4:35 AM
> *To:* users 
> *Subject:* Handling CSVs dynamically with NiFi
>
>
>
> We have a task requiring that we transform incoming CSV files to JSON. The
> CSVs vary in schema.
>
>
>
> There are a number of interesting flow examples out there illustrating how
> one can set up a flow to handle the case where the CSV schema is well known
> and fixed, but none for the generalized case.
>
>
>
> The structure of the incoming CSV files will not be known in advance in
> our use case. Our nifi flow must be generalized because I cannot configure
> and rely on a service that defines a specific fixed Avro schema registry.
> An Avro schema registry seems to presume an awareness of the CSV
> structure in advance. We don't have that luxury in this use case, with CSVs
> arriving from many different providers and so characterized by schemas that
> are unknown.
>
>
>
> What is the best way to get around this challenge? Does anyone know of an
> example where NiFi builds the schema on the fly as CSVs arrive for
> processing, dynamically defining the Avro schema for the CSV?
>
>
>
> Thanks in advance for any thoughts.
>


Handling CSVs dynamically with NiFi

2023-04-06 Thread James McMahon
We have a task requiring that we transform incoming CSV files to JSON. The
CSVs vary in schema.

There are a number of interesting flow examples out there illustrating how
one can set up a flow to handle the case where the CSV schema is well known
and fixed, but none for the generalized case.

The structure of the incoming CSV files will not be known in advance in our
use case. Our nifi flow must be generalized because I cannot configure and
rely on a service that defines a specific fixed Avro schema registry. An Avro
schema registry seems to presume an awareness of the CSV structure in
advance. We don't have that luxury in this use case, with CSVs arriving
from many different providers and so characterized by schemas that are
unknown.

What is the best way to get around this challenge? Does anyone know of an
example where NiFi builds the schema on the fly as CSVs arrive for
processing, dynamically defining the Avro schema for the CSV?

Thanks in advance for any thoughts.


IAM for authentication and authorization to NiFi?

2023-03-17 Thread James McMahon
Hello. We run nifi on an AWS EC2 instance. I currently employ certs for
nifi user authentication. The CA is in our nifi truststore. Users install
certs issued by the CA in their browsers. I've set up a parsing pattern in
nifi.properties to extract user identities from the CN of the cert, and I
employ those user identities to compare against entries in nifi policies
and nifi groups to authorize what each user is entitled to do within nifi.

My team lead has asked whether we can replace the CA and certificates
dependency with AWS IAM, and it is not clear to me that such a change would
be possible. Can anyone refer me to a guide that shows whether IAM can
supplant authentication to NiFi by cert and authorization by IAM identity
against nifi user and group policies?

It seems to me that IAM is ideal for identity and access management to *AWS*
resources. For example, we can set up roles to permit and control access to
S3 buckets, or to control access to services like AWS Lambda. But IAM is
not intended to be used as a CA or as an authorization mechanism to/within
nifi. Am I mistaken?

Thank you in advance.


Re: FlattenJSON is failing

2023-02-15 Thread James McMahon
That was it - thank you very much Rafael. Once I corrected that in the
attribute, the (un)FlattenJSON worked like a champ. I sure appreciate your
help.
Jim

On Tue, Feb 14, 2023 at 7:27 PM Rafael Fracasso 
wrote:

> Hey, I got the same problem while reproducing your scenario:
>
> 18:05:46 AMT
> ERROR
> 51f2c975-0186-1000-27d8-85a989a6d1c8
>
> FlattenJson[id=51f2c975-0186-1000-27d8-85a989a6d1c8] Failed to unflatten 
> JSON: java.lang.ClassCastException: class 
> com.fasterxml.jackson.databind.node.TextNode cannot be cast to class 
> com.fasterxml.jackson.databind.node.ObjectNode 
> (com.fasterxml.jackson.databind.node.TextNode and 
> com.fasterxml.jackson.databind.node.ObjectNode are in unnamed module of 
> loader org.apache.nifi.nar.NarClassLoader @1642eeae)
>
>
>
>
>
> But then I analyse closely your json structure:
>
> {
>"filename":"PLACES_ABC.csv",
>
>  
> "sourcing.SHA256":"d46884d5b9f2617a9f16e7a4e8b056036f07cb02cb85953c5065dd55ff8e3c33",
>"sourcing.MD5":"dd74cb837e5e701cdfa1fa070703be48",
>
>  
> "sourcing.sourceSHA256":"e3daeb8cfd6db4aad20bb42900bc5fa4815eba7e55d97cb01a1a9674668f20b2",
>"sourcing.sourceMD5":"a18eed985ddb04cbe13b487062628585",
>"triage.datatype":"mdb",
>"triage.mdb.version":"JET4",
>"triage.mdb.tables":"PLACES::-::ACCOUNTS::-::VEHICLES",
>"triage.mdb.table.rowcount":"9982",
>"triage.mdb.table":"PLACES",
>"triage.mdb.table.header":"FIELDA,FIELDB,FIELDC",
>"triage.mdb.table.database":"ABC.mdb"
> }
>
> You got a property named table with value "PLACES", than you have a nested
> header and database properties.
>
> With the correct structure (add a property for the "PLACES" table), the
> processor works as spected:
>
> {
>"filename":"PLACES_ABC.csv",
>"sourcing":{
>
> "SHA256":"d46884d5b9f2617a9f16e7a4e8b056036f07cb02cb85953c5065dd55ff8e3c33",
>   "MD5":"dd74cb837e5e701cdfa1fa070703be48",
>
> "sourceSHA256":"e3daeb8cfd6db4aad20bb42900bc5fa4815eba7e55d97cb01a1a9674668f20b2",
>   "sourceMD5":"a18eed985ddb04cbe13b487062628585"
>},
>"triage":{
>   "datatype":"mdb",
>   "mdb":{
>  "version":"JET4",
>  "tables":"PLACES::-::ACCOUNTS::-::VEHICLES",
>  "table":{
> "rowcount":"9982",
> "table":"PLACES",
> "header":"FIELDA,FIELDB,FIELDC",
> "database":"ABC.mdb"
>  }
>   }
>}
> }
>
> On Tue, Feb 14, 2023 at 4:11 PM James McMahon 
> wrote:
>
>> I have used AttributeToJSON to generate this JSON:
>>
>>
>> {"sourcing.SHA256":"d46884d5b9f2617a9f16e7a4e8b056036f07cb02cb85953c5065dd55ff8e3c33","sourcing.MD5":"dd74cb837e5e701cdfa1fa070703be48","filename":"PLACES_ABC.csv","sourcing.sourceSHA256":"e3daeb8cfd6db4aad20bb42900bc5fa4815eba7e55d97cb01a1a9674668f20b2","triage.datatype":"mdb","triage.mdb.version":"JET4","triage.mdb.tables":"PLACES::-::ACCOUNTS::-::VEHICLES","sourcing.sourceMD5":"a18eed985ddb04cbe13b487062628585","triage.mdb.table.rowcount":"9982","triage.mdb.table":"PLACES","triage.mdb.table.header":"FIELDA,FIELDB,FIELDC","triage.mdb.table.database":"ABC.mdb"}
>>
>>
>> I try to employ a FlattenJSON to (un)flatten my JSON. It is configured
>> like so:
>> Separator .
>> F M   dot notation
>> IRCfalse
>> RT  unflatten
>> CS  UTF-8
>> PPJ false
>>
>>
>> This error below results. Why? How can I get past this problem?
>>
>> 19:54:40 UTC
>> ERROR
>> 4d6c3f2a-a72e-16b2-68ac-c90d5c31498d
>>
>> FlattenJson[id=4d6c3f2a-a72e-16b2-68ac-c90d5c31498d] Failed to unflatten 
>> JSON: java.lang.ClassCastException: class 
>> com.fasterxml.jackson.databind.node.TextNode cannot be cast to class 
>> com.fasterxml.jackson.databind.node.ObjectNode 
>> (com.fasterxml.jackson.databind.node.TextNode and 
>> com.fasterxml.jackson.databind.node.ObjectNode are in unnamed module of 
>> loader org.apache.nifi.nar.NarClassLoader @2679311f)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


FlattenJSON is failing

2023-02-14 Thread James McMahon
I have used AttributeToJSON to generate this JSON:

{"sourcing.SHA256":"d46884d5b9f2617a9f16e7a4e8b056036f07cb02cb85953c5065dd55ff8e3c33","sourcing.MD5":"dd74cb837e5e701cdfa1fa070703be48","filename":"PLACES_ABC.csv","sourcing.sourceSHA256":"e3daeb8cfd6db4aad20bb42900bc5fa4815eba7e55d97cb01a1a9674668f20b2","triage.datatype":"mdb","triage.mdb.version":"JET4","triage.mdb.tables":"PLACES::-::ACCOUNTS::-::VEHICLES","sourcing.sourceMD5":"a18eed985ddb04cbe13b487062628585","triage.mdb.table.rowcount":"9982","triage.mdb.table":"PLACES","triage.mdb.table.header":"FIELDA,FIELDB,FIELDC","triage.mdb.table.database":"ABC.mdb"}


I try to employ a FlattenJSON to (un)flatten my JSON. It is configured like
so:
Separator .
F M   dot notation
IRCfalse
RT  unflatten
CS  UTF-8
PPJ false


This error below results. Why? How can I get past this problem?

19:54:40 UTC
ERROR
4d6c3f2a-a72e-16b2-68ac-c90d5c31498d

FlattenJson[id=4d6c3f2a-a72e-16b2-68ac-c90d5c31498d] Failed to
unflatten JSON: java.lang.ClassCastException: class
com.fasterxml.jackson.databind.node.TextNode cannot be cast to class
com.fasterxml.jackson.databind.node.ObjectNode
(com.fasterxml.jackson.databind.node.TextNode and
com.fasterxml.jackson.databind.node.ObjectNode are in unnamed module
of loader org.apache.nifi.nar.NarClassLoader @2679311f)


Re: How to cherry pick a specific line from a flowfile?

2023-02-11 Thread James McMahon
Wanted to thank Matt and Mark for their assistance. Also wanted to follow
up with a final reply detailing how I got this to work in case it may
benefit anyone else down the road.

*The configuration of the RouteText processor that worked to extract a
specific line:*
Routing Strategy   Route to 'matched' if line
matches all conditions
Matching StrategySatisfies Expression
Character Set   UTF-8
Ignore Leading/Trailing Whitespace true
Ignore Case  false
Grouping Regular Expression  No value set
isHeader   ${lineNo:equals(1)}

One observation about the RouteText solution: it appears to replace the
content of the flowfile with the match. So if you need to preserve the
incoming content of your flowfile for other purposes or if you want the
match to be saved to an attribute, this may not be for you. I was unable to
find a way to get RouteText to direct the match to a flowfile attribute
(using NiFi version 1.16.3. "Why a previous version, Jim, rather than 1.19
or the newly released 1.20?" Because I integrated some dependencies that
have only been tested through v1.16 - a consideration I'll have to tackle
later.)

*The groovy script I got to work that employs a BufferedReader to do the
same, saving to a new attribute:*

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def header
def ff=session.get()
if(!ff) return
try {
// Here we are reading from the current flowfile content and writing to
the new content
ff = session.write(ff, { inputStream, outputStream ->
def bufferedReader = new BufferedReader(new
InputStreamReader(inputStream))
def bufferedWriter = new BufferedWriter(new
OutputStreamWriter(outputStream))

// Header is the first line...
header = bufferedReader.readLine() + "\n"

bufferedWriter.write(header)

def line

int i = 1

// While the incoming line is not empty, write it to the
outputStream
while ((line = bufferedReader.readLine()) != null) {
bufferedWriter.write(line)
bufferedWriter.newLine()
i++
}

// By default, INFO doesn't show in the logs and WARN will appear
in the processor bulletins
log.warn("Wrote ${i} lines to output")

bufferedReader.close()
bufferedWriter.close()
} as StreamCallback)
ff = session.putAttribute(ff, 'mdb.table.header', header)
session.transfer(ff, REL_SUCCESS)
} catch (Exception e) {
 log.error('Error occurred extracting header for table in mdb file', e)
 session.transfer(ff, REL_FAILURE)
}

Jim

On Fri, Feb 10, 2023 at 7:19 AM James McMahon  wrote:

> Ah - of course. I went overboard here. Just because I don't use the
> OutputStream for this purpose doesn't mean I can assume the method
> signature for the session.write() doesn't still require it. I'll fix this
> later tonight. Thank you both very much, Matt and Mark.
> Jim
>
> On Thu, Feb 9, 2023 at 10:44 PM Matt Burgess  wrote:
>
>> session.write() doesn’t take just an InputStream, it either takes both an
>> InputStream and OutputStream (if using a StreamCallback like you are) or
>> just an OutputStream (using an OutputStreamCallback, usually for source
>> processors that don’t have FlowFile input)
>>
>> Sent from my iPhone
>>
>> On Feb 9, 2023, at 9:34 PM, James McMahon  wrote:
>>
>> 
>> Mark, your RouteText blog worked perfectly. Thank you very much.
>>
>> Matt, I still want to get the BufferedReader working. I'm close. Here is
>> my code, with the error that results. I do not know what this error means.
>> Any thoughts?
>>
>> import org.apache.commons.io.IOUtils
>> import java.nio.charset.StandardCharsets
>> def ff=session.get()
>> if(!ff)return
>> try {
>> // Here we are reading from the current flowfile content and writing
>> to the new content
>> //ff = session.write(ff, { inputStream, outputStream ->
>> ff = session.write(ff, { inputStream ->
>> def bufferedReader = new BufferedReader(new
>> InputStreamReader(inputStream))
>>
>> // Header is the first line...
>> def header = bufferedReader.readLine()
>> ff = session.putAttribute(ff, 'mdb.table.header', header)
>>
>>
>> //def bufferedWriter = new BufferedWriter(new
>> OutputStreamWriter(outputStream))
>> //def line
>>
>> int i = 0
>>
>> // While the incoming line is not empty, write it to the
>> outputStream
>> //while ((line = bufferedReader.readLine()) != null) {
>> //bufferedWriter.w

Re: How to cherry pick a specific line from a flowfile?

2023-02-10 Thread James McMahon
Ah - of course. I went overboard here. Just because I don't use the
OutputStream for this purpose doesn't mean I can assume the method
signature for the session.write() doesn't still require it. I'll fix this
later tonight. Thank you both very much, Matt and Mark.
Jim

On Thu, Feb 9, 2023 at 10:44 PM Matt Burgess  wrote:

> session.write() doesn’t take just an InputStream, it either takes both an
> InputStream and OutputStream (if using a StreamCallback like you are) or
> just an OutputStream (using an OutputStreamCallback, usually for source
> processors that don’t have FlowFile input)
>
> Sent from my iPhone
>
> On Feb 9, 2023, at 9:34 PM, James McMahon  wrote:
>
> 
> Mark, your RouteText blog worked perfectly. Thank you very much.
>
> Matt, I still want to get the BufferedReader working. I'm close. Here is
> my code, with the error that results. I do not know what this error means.
> Any thoughts?
>
> import org.apache.commons.io.IOUtils
> import java.nio.charset.StandardCharsets
> def ff=session.get()
> if(!ff)return
> try {
> // Here we are reading from the current flowfile content and writing
> to the new content
> //ff = session.write(ff, { inputStream, outputStream ->
> ff = session.write(ff, { inputStream ->
> def bufferedReader = new BufferedReader(new
> InputStreamReader(inputStream))
>
> // Header is the first line...
> def header = bufferedReader.readLine()
> ff = session.putAttribute(ff, 'mdb.table.header', header)
>
>
> //def bufferedWriter = new BufferedWriter(new
> OutputStreamWriter(outputStream))
> //def line
>
> int i = 0
>
> // While the incoming line is not empty, write it to the
> outputStream
> //while ((line = bufferedReader.readLine()) != null) {
> //bufferedWriter.write(line)
> //bufferedWriter.newLine()
> //i++
> //}
>
> // By default, INFO doesn't show in the logs and WARN will appear
> in the processor bulletins
> //log.warn("Wrote ${i} lines to output")
>
> bufferedReader.close()
> //bufferedWriter.close()
> } as StreamCallback)
>
> session.transfer(ff, REL_SUCCESS)
> } catch (Exception e) {
>  log.error('Error occurred extracting header for table in mdb file', e)
>  session.transfer(ff, REL_FAILURE)
> }
>
> The ExecuteScript processor throws this error:
> 02:30:38 UTC
> ERROR
> 4d6c3e21-a72e-16b2-6a7f-f5cadb351c0e
>
> ExecuteScript[id=4d6c3e21-a72e-16b2-6a7f-f5cadb351c0e] Error occurred 
> extracting header for table in mdb file: groovy.lang.MissingMethodException: 
> No signature of method: Script10$_run_closure1.doCall() is applicable for 
> argument types: 
> (org.apache.nifi.controller.repository.io.TaskTerminationInputStream...) 
> values: 
> [org.apache.nifi.controller.repository.io.TaskTerminationInputStream@5a2d22a7,
>  ...]
> Possible solutions: doCall(java.lang.Object), findAll(), findAll()
>
>
>
> On Thu, Feb 9, 2023 at 8:35 PM Mark Payne  wrote:
>
>> James,
>>
>> Have a look at the RouteText processor. I wrote a blog post recently on
>> using it:
>> https://medium.com/cloudera-inc/building-an-effective-nifi-flow-routetext-5068a3b4efb3
>>
>> Thanks
>> Mark
>>
>> Sent from my iPhone
>>
>> On Feb 9, 2023, at 8:06 PM, James McMahon  wrote:
>>
>> 
>> My version of nifi does not have Range Sampling unfortunately.
>> If I get the flowfile through a session as done in the Cookbook, does
>> anyone know of an approach in Groovy to grab line N and avoid loading the
>> entire CSV file into string variable *text*?
>>
>> On Thu, Feb 9, 2023 at 7:18 PM Matt Burgess  wrote:
>>
>>> I’m AFK ATM but Range Sampling was added into the SampleRecord processor
>>> (https://issues.apache.org/jira/browse/NIFI-9814), the Jira doesn’t say
>>> which version it went into but it is definitely in 1.19.1+. If that’s
>>> available to you then you can just specify “2” as the range and it will
>>> only return that line.
>>>
>>> For total record count without loading the whole thing into memory,
>>> there’s probably a more efficient way but you could use ConvertRecord and
>>> convert it from CSV to CSV and it should write out the “record.count”
>>> attribute. I think some/most/all record processors write this attribute,
>>> and they work record by record so they don’t load the whole thing into
>>> memory. Even SampleRecord adds a record.count attribute but if you specify
>>> one line the value will be 1 :)
>>>
>>> 

Re: How to cherry pick a specific line from a flowfile?

2023-02-09 Thread James McMahon
Mark, your RouteText blog worked perfectly. Thank you very much.

Matt, I still want to get the BufferedReader working. I'm close. Here is my
code, with the error that results. I do not know what this error means. Any
thoughts?

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def ff=session.get()
if(!ff)return
try {
// Here we are reading from the current flowfile content and writing to
the new content
//ff = session.write(ff, { inputStream, outputStream ->
ff = session.write(ff, { inputStream ->
def bufferedReader = new BufferedReader(new
InputStreamReader(inputStream))

// Header is the first line...
def header = bufferedReader.readLine()
ff = session.putAttribute(ff, 'mdb.table.header', header)


//def bufferedWriter = new BufferedWriter(new
OutputStreamWriter(outputStream))
//def line

int i = 0

// While the incoming line is not empty, write it to the
outputStream
//while ((line = bufferedReader.readLine()) != null) {
//bufferedWriter.write(line)
//bufferedWriter.newLine()
//i++
//}

// By default, INFO doesn't show in the logs and WARN will appear
in the processor bulletins
//log.warn("Wrote ${i} lines to output")

bufferedReader.close()
//bufferedWriter.close()
} as StreamCallback)

session.transfer(ff, REL_SUCCESS)
} catch (Exception e) {
 log.error('Error occurred extracting header for table in mdb file', e)
 session.transfer(ff, REL_FAILURE)
}

The ExecuteScript processor throws this error:
02:30:38 UTC
ERROR
4d6c3e21-a72e-16b2-6a7f-f5cadb351c0e

ExecuteScript[id=4d6c3e21-a72e-16b2-6a7f-f5cadb351c0e] Error occurred
extracting header for table in mdb file:
groovy.lang.MissingMethodException: No signature of method:
Script10$_run_closure1.doCall() is applicable for argument types:
(org.apache.nifi.controller.repository.io.TaskTerminationInputStream...)
values: 
[org.apache.nifi.controller.repository.io.TaskTerminationInputStream@5a2d22a7,
...]
Possible solutions: doCall(java.lang.Object), findAll(), findAll()



On Thu, Feb 9, 2023 at 8:35 PM Mark Payne  wrote:

> James,
>
> Have a look at the RouteText processor. I wrote a blog post recently on
> using it:
> https://medium.com/cloudera-inc/building-an-effective-nifi-flow-routetext-5068a3b4efb3
>
> Thanks
> Mark
>
> Sent from my iPhone
>
> On Feb 9, 2023, at 8:06 PM, James McMahon  wrote:
>
> 
> My version of nifi does not have Range Sampling unfortunately.
> If I get the flowfile through a session as done in the Cookbook, does
> anyone know of an approach in Groovy to grab line N and avoid loading the
> entire CSV file into string variable *text*?
>
> On Thu, Feb 9, 2023 at 7:18 PM Matt Burgess  wrote:
>
>> I’m AFK ATM but Range Sampling was added into the SampleRecord processor (
>> https://issues.apache.org/jira/browse/NIFI-9814), the Jira doesn’t say
>> which version it went into but it is definitely in 1.19.1+. If that’s
>> available to you then you can just specify “2” as the range and it will
>> only return that line.
>>
>> For total record count without loading the whole thing into memory,
>> there’s probably a more efficient way but you could use ConvertRecord and
>> convert it from CSV to CSV and it should write out the “record.count”
>> attribute. I think some/most/all record processors write this attribute,
>> and they work record by record so they don’t load the whole thing into
>> memory. Even SampleRecord adds a record.count attribute but if you specify
>> one line the value will be 1 :)
>>
>> Regards,
>> Matt
>>
>>
>> On Feb 9, 2023, at 6:57 PM, James McMahon  wrote:
>>
>> 
>> Hello. I am trying to identify a header line and a data line count from a
>> flowfile that is in csv format.
>>
>> Most of us are familiar with Matt B's outstanding Cookbook series, and I
>> am trying to use that as my starting point. Here is my Groovy code:
>>
>> import org.apache.commons.io.IOUtils
>> import java.nio.charset.StandardCharsets
>> def ff=session.get()
>> if(!ff)return
>> try {
>>  def text = ''
>>  // Cast a closure with an inputStream parameter to
>> InputStreamCallback
>>  session.read(ff, {inputStream ->
>>   text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>>   // Do something with text here
>>   // get header from the second line of the flowfile
>>   // set datacount as the total line count of the file - 2
>>   ...
>>   ff = session.putAttribute(ff, 'mdb.table.header', header)
>>   ff = session.putAttribute(ff, 'mdb.tab

Re: How to cherry pick a specific line from a flowfile?

2023-02-09 Thread James McMahon
My version of nifi does not have Range Sampling unfortunately.
If I get the flowfile through a session as done in the Cookbook, does
anyone know of an approach in Groovy to grab line N and avoid loading the
entire CSV file into string variable *text*?

On Thu, Feb 9, 2023 at 7:18 PM Matt Burgess  wrote:

> I’m AFK ATM but Range Sampling was added into the SampleRecord processor (
> https://issues.apache.org/jira/browse/NIFI-9814), the Jira doesn’t say
> which version it went into but it is definitely in 1.19.1+. If that’s
> available to you then you can just specify “2” as the range and it will
> only return that line.
>
> For total record count without loading the whole thing into memory,
> there’s probably a more efficient way but you could use ConvertRecord and
> convert it from CSV to CSV and it should write out the “record.count”
> attribute. I think some/most/all record processors write this attribute,
> and they work record by record so they don’t load the whole thing into
> memory. Even SampleRecord adds a record.count attribute but if you specify
> one line the value will be 1 :)
>
> Regards,
> Matt
>
>
> On Feb 9, 2023, at 6:57 PM, James McMahon  wrote:
>
> 
> Hello. I am trying to identify a header line and a data line count from a
> flowfile that is in csv format.
>
> Most of us are familiar with Matt B's outstanding Cookbook series, and I
> am trying to use that as my starting point. Here is my Groovy code:
>
> import org.apache.commons.io.IOUtils
> import java.nio.charset.StandardCharsets
> def ff=session.get()
> if(!ff)return
> try {
>  def text = ''
>  // Cast a closure with an inputStream parameter to InputStreamCallback
>  session.read(ff, {inputStream ->
>   text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>   // Do something with text here
>   // get header from the second line of the flowfile
>   // set datacount as the total line count of the file - 2
>   ...
>   ff = session.putAttribute(ff, 'mdb.table.header', header)
>   ff = session.putAttribute(ff, 'mdb.table.datarecords', datacount)
>  } as InputStreamCallback)
>  session.transfer(flowFile, REL_SUCCESS)
> } catch(e) {
>  log.error('Error occurred identifying tables in mdb file', e)
>  session.transfer(ff, REL_FAILURE)
> }
>
> I want to avoid using that line in red, because as Matt cautions in his
> cookbook, our csv files are too large. I do not want to read in the entire
> file to variable text. It's going to be a problem.
>
> How in Groovy can I cherry pick only the line I want from the stream (line
> #2 in this case)?
>
> Also, how can I get a count of the total lines without loading them all
> into text?
>
> Thanks in advance for your help.
>
>


How to cherry pick a specific line from a flowfile?

2023-02-09 Thread James McMahon
Hello. I am trying to identify a header line and a data line count from a
flowfile that is in csv format.

Most of us are familiar with Matt B's outstanding Cookbook series, and I am
trying to use that as my starting point. Here is my Groovy code:

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def ff=session.get()
if(!ff)return
try {
 def text = ''
 // Cast a closure with an inputStream parameter to InputStreamCallback
 session.read(ff, {inputStream ->
  text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
  // Do something with text here
  // get header from the second line of the flowfile
  // set datacount as the total line count of the file - 2
  ...
  ff = session.putAttribute(ff, 'mdb.table.header', header)
  ff = session.putAttribute(ff, 'mdb.table.datarecords', datacount)
 } as InputStreamCallback)
 session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
 log.error('Error occurred identifying tables in mdb file', e)
 session.transfer(ff, REL_FAILURE)
}

I want to avoid using that line in red, because as Matt cautions in his
cookbook, our csv files are too large. I do not want to read in the entire
file to variable text. It's going to be a problem.

How in Groovy can I cherry pick only the line I want from the stream (line
#2 in this case)?

Also, how can I get a count of the total lines without loading them all
into text?

Thanks in advance for your help.


Detecting if a flowfile is a packed file

2023-01-13 Thread James McMahon
Hello. Does anyone have an approach they would recommend for reliably
detecting if a flowfile is a packed file that needs to be routed through
UnpackContent?

I have a RouteOnAttribute within which I check for filenames that end in
tar or zip, but that seems unsophisticated and overly reliant on the file
name. I OR that with a check of mime.type too, but that result is sometimes
determined by file name too.

Any other "best practices" to do this?


Re: Unable to view files in queue

2023-01-12 Thread James McMahon
Okay. It’s good to know that size may simply be asking the UI to do too
much. Thank you Rafael.

On Thu, Jan 12, 2023 at 7:57 PM Rafael Fracasso 
wrote:

> Hi James, you are opening 110MB CSV in your web browser and indeed that's
> a little large.
>
> I suggest you use the download option instead of view.
>
> On Tue, Jan 10, 2023 at 9:06 PM James McMahon 
> wrote:
>
>> I have a group of csv files in a queue that I want to inspect. I List
>> Queue, select one that appears to be 110MB in size, go to my Details tab,
>> then click View. The UI spins and spins, displays a header as I would
>> expect, but never returns data rows.
>>
>> I'm not sure how to debug this. The nifi-app.log doesn't appear to show
>> any error. Is 110 MB too large to View from the queue UI interface? Is
>> there a way I can debug this, or perhaps a property I can try resetting to
>> a better value?
>>
>> I am running  nifi 1.16.3, not clustered.
>>
>> Thanks in advance for any help.
>>
>


Unable to view files in queue

2023-01-10 Thread James McMahon
I have a group of csv files in a queue that I want to inspect. I List
Queue, select one that appears to be 110MB in size, go to my Details tab,
then click View. The UI spins and spins, displays a header as I would
expect, but never returns data rows.

I'm not sure how to debug this. The nifi-app.log doesn't appear to show any
error. Is 110 MB too large to View from the queue UI interface? Is there a
way I can debug this, or perhaps a property I can try resetting to a better
value?

I am running  nifi 1.16.3, not clustered.

Thanks in advance for any help.


Re: NiFi failing to start

2023-01-02 Thread James McMahon
Russ and Chris, thank you for responding. I got this to work. It turned out
to be a combination of different issues. One was finding the proper
arguments for tls-toolkit so that my CN would be set properly in my certs.
The second was configuration settings in my nifi.properties file.

For that first challenge, these were the tls-toolkit commands I used that
worked:
./bin/tls-toolkit.sh standalone -n 'ec2-52-4-149-72.compute-1.amazonaws.com'
--certificateAuthorityHostname 'ec2-52-4-149-72.compute-1.amazonaws.com'
./bin/tls-toolkit.sh standalone -B  -C "CN=admin, OU=NIFI"
Once I did this, I used openssl to query my pkcs12 file, verifying that my
CN now properly included ec2-52-4-149-72.compute-1.amazonaws.com rather
than *localhost*. This is an excerpt of what I found:

Command: openssl pkcs12 -info -nodes -in CN=admin_OU=NIFI.p12

Enter Import Password:

MAC Iteration 102400

MAC verified OK

PKCS7 Data

...

*subject=/OU=NIFI/CN=admin*

*issuer=/OU=NIFI/CN=ec2-52-4-149-72.compute-1.amazonaws.com
<http://ec2-52-4-149-72.compute-1.amazonaws.com>*

-BEGIN CERTIFICATE-

...

-END CERTIFICATE-

Certificate bag

Bag Attributes: 

*subject=/OU=NIFI/CN=ec2-52-4-149-72.compute-1.amazonaws.com
<http://ec2-52-4-149-72.compute-1.amazonaws.com>*

*issuer=/OU=NIFI/CN=ec2-52-4-149-72.compute-1.amazonaws.com
<http://ec2-52-4-149-72.compute-1.amazonaws.com>*

-BEGIN CERTIFICATE-

…..

-END CERTIFICATE-

For that second impediment, I eventually got it to work using these entries
in my nifi.properties:
nifi.authorizer.configuration.file=/opt/nifi/config_resources/authorizers.xml
#nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml
nifi.login.identity.provider.configuration.file=

I'm still not entirely sure why I don't need to have a
login-identity-providers.xml file, but suspect it is because I now employ
TLS to present my identity, maybe?
My authorizers.xml looks like this:




file-user-group-provider
org.apache.nifi.authorization.FileUserGroupProvider
./conf/users.xml


CN=admin,
OU=NiFi


file-access-policy-provider

org.apache.nifi.authorization.FileAccessPolicyProvider
file-user-group-provider
./conf/authorizations.xml
CN=admin, OU=NiFi

CN=admin, OU=NiFi



managed-authorizer

org.apache.nifi.authorization.StandardManagedAuthorizer
file-access-policy-provider



Thanks again for your replies.
Jim


On Wed, Dec 28, 2022 at 2:48 PM Russell Bateman 
wrote:

> In case you or someone else wishes only to run, develop, start, stop,
> start over, etc., and doesn't care to authenticate a (non-production)
> installation, I have followed this since NiFi 1.14 and last used it for
> 1.19:
>
> https://www.javahotchocolate.com/notes/nifi.html#20210716
>
> If this doesn't work it's usually because the properties file has become
> too modified. Start over with a fresh download.
>
> Russ
>
>
> On 12/28/22 12:03, Chris Sampson wrote:
>
> I think you will need to remove/comment out the references to
> single-user-provider in authorisers.xml and login-providers.xml as well as
> removing it from nifi.properties (see the comments in these files as
> they're provided in the nifi distributions).
>
> If you are using 2-way TLS authentication then I don't think you need to
> configure anything else, but remember that all of your nifi instances in
> your cluster (if applicable) will need to trust one another's certificates
> along with all user certificates - the easiest way of doing this is
> typically to trust a common CA that issues all the nifi instance and user
> certs. This could be nifi-toolkit, but beware that the CA used by toolkit
> is auto-generated on startup, so you need to retain and configure the same
> CA for toolkit of you plan to use it to issue new certs in future.
>
> On Wed, 28 Dec 2022, 17:32 James McMahon,  wrote:
>
>> I continue to experience errors when I try to start my nifi 1.16.3
>> instance. I have followed this guide in an effort to use the toolkit to
>> generate self-0signed certs for user admin, signed by a nifi truststore:
>>
>> Apache NiFi Walkthroughs
>> <https://nifi.apache.org/docs/nifi-docs/html/walkthroughs.html>
>>
>> I seem to be having issues with this in my nifi.properties:
>> nifi.security.user.authorizer=single-user-authorizer
>>
>> When I set it to nothing, it tells me this is required. When I set it to
>> single-user-authorizer, this error results in the log:
>>  Error creating bean with name 'authorizer': FactoryBean threw exception
>> on object creation; nested exception is java.lang.Exception: The specified
>> authorizer 'single-user-authorizer' could not be found.
>>

Getting the proper CN in self-signed certs from tls-toolkit?

2022-12-29 Thread James McMahon
Not sure whether this question belongs in the users or developers domain.
Am asking in both hoping to get assistance.

I am trying to use tls-toolkit to create a CA and self-signed certs. I
notice that my CN in my pem file is not what I request on the command line.
How can I successfully force tls-toolkit to set the CN as requested?

I am running NiFi 1.16.3. Toolkit for that same version.

[ec2-user@ip-172-31-73-197 *nifi-toolkit-1.16.3]$ sudo ./bin/tls-toolkit.sh
standalone -n 'ec2-52-4-149-72.compute-1.amazonaws.com
'*

[main] INFO
org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandaloneCommandLine - No
nifiPropertiesFile specified, using embedded one.
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
Running standalone certificate generation with output directory
../nifi-toolkit-1.16.3
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
Generated new CA certificate ../nifi-toolkit-1.16.3/nifi-cert.pem and key
../nifi-toolkit-1.16.3/nifi-key.key
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
Writing new ssl configuration to ../nifi-toolkit-1.16.3/
ec2-52-4-149-72.compute-1.amazonaws.com
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
Successfully generated TLS configuration for
ec2-52-4-149-72.compute-1.amazonaws.com 1 in ../nifi-toolkit-1.16.3/
ec2-52-4-149-72.compute-1.amazonaws.com
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
No clientCertDn specified, not generating any client certificates.
[main] INFO org.apache.nifi.toolkit.tls.standalone.TlsToolkitStandalone -
tls-toolkit standalone completed successfully
[ec2-user@ip-172-31-73-197 nifi-toolkit-1.16.3]$ ls -tl
total 84
drwx-- 2 root root71 Dec 29 22:22
ec2-52-4-149-72.compute-1.amazonaws.com
-rw--- 1 root root  1233 Dec 29 22:22 nifi-cert.pem
-rw--- 1 root root  1675 Dec 29 22:22 nifi-key.key
drwxr-xr-x 2 root root  4096 Jun 13  2022 bin
drwxr-xr-x 3 root root45 Jun 13  2022 classpath
drwxr-xr-x 2 root root88 Jun 13  2022 conf
drwxrwx--- 3 root root 16384 Jun 13  2022 lib
-rw-r--r-- 1 root root 41590 Jun 13  2022 LICENSE
-rw-r--r-- 1 root root  7372 Jun 13  2022 NOTICE

[ec2-user@ip-172-31-73-197 nifi-toolkit-1.16.3]$ *sudo keytool -printcert
-file nifi-cert.pem*
Owner: CN=localhost, OU=NIFI
Issuer: CN=localhost, OU=NIFI
.
.
.


NiFi failing to start

2022-12-28 Thread James McMahon
I continue to experience errors when I try to start my nifi 1.16.3
instance. I have followed this guide in an effort to use the toolkit to
generate self-0signed certs for user admin, signed by a nifi truststore:

Apache NiFi Walkthroughs


I seem to be having issues with this in my nifi.properties:
nifi.security.user.authorizer=single-user-authorizer

When I set it to nothing, it tells me this is required. When I set it to
single-user-authorizer, this error results in the log:
 Error creating bean with name 'authorizer': FactoryBean threw exception on
object creation; nested exception is java.lang.Exception: The specified
authorizer 'single-user-authorizer' could not be found.

I suspect my authorizers.xml and/or my login-identity-providers.xml files
are misconfigured. How should those two config files be structured if I
wish to run a secure nifi instance where mith my self-signed certs,
generated using the nifi toolkit?


Re: Failing to start - keystore properties invalid

2022-12-28 Thread James McMahon
This morning through further research I came across this by Bryan Bende: Apache
NiFi 1.14.0 - Secure by Default (bryanbende.com)
<https://bryanbende.com/development/2021/07/19/apache-nifi-1-14-0-secure-by-default>
It appears that beginning with Apache NiFi 1.14.0, it is possible to have
nifi establish the truststore and keystore if they are not present at
startup. So I tried this, bearing in mind that  I am trying to start up
v1.16.3.

My nifi.properties has these parms set in it:
nifi.web.https.host=ec2-52-4-149-72.compute-1.amazonaws.com
nifi.web.https.port=8443
nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=./conf/keystore.p12
nifi.security.keystoreType=PKCS12
nifi.security.keystorePasswd=
nifi.security.keyPasswd=
nifi.security.truststore=./conf/truststore.p12
nifi.security.truststoreType=PKCS12
nifi.security.truststorePasswd=
nifi.security.user.authorizer=single-user-authorizer
nifi.security.allow.anonymous.authentication=false
nifi.security.user.login.identity.provider=single-user-provider


 My authorizers.xml:


single-user-authorizer

org.apache.nifi.authorization.single.user.SingleUserAuthorizer



My login-identity-proividers.xml:

   single-user-provider

 
org.apache.nifi.authentication.single.user.SingleUserLoginIdentityProvider
   
   


And even in this minimalist state, startup fails with this entry in the
nifi-app.log:
2022-12-28 13:59:21,744 INFO [main] o.a.n.r.v.FileBasedVariableRegistry
Loaded a total of 90 properties.  Including precedence overrides effective
accessible registry key
 size is 90
2022-12-28 13:59:22,117 WARN [main]
o.a.nifi.security.util.SslContextFactory Some truststore properties are
populated (./conf/truststore.p12, null, PKCS12) but not valid
2022-12-28 13:59:22,117 ERROR [main]
o.apache.nifi.controller.FlowController Unable to start the flow controller
because the TLS configuration was invalid: The truststore
 properties are not valid
2022-12-28 13:59:22,154 ERROR [main] o.s.web.context.ContextLoader Context
initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name
'org.springframework.security.config.annotation.web.configuration.WebSecurityConfig
uration': Initialization of bean failed; nested exception is
org.springframework.beans.factory.UnsatisfiedDependencyException: Error
creating bean with name 'org.apache.n
ifi.web.NiFiWebApiSecurityConfiguration': Unsatisfied dependency expressed
through method 'setJwtAuthenticationProvider' parameter 0; nested exception
is org.springframew
ork.beans.factory.UnsatisfiedDependencyException: Error creating bean with
name
'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
Unsat
isfied dependency expressed through constructor parameter 3; nested
exception is org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'f
lowController': FactoryBean threw exception on object creation; nested
exception is java.lang.IllegalStateException: Flow controller TLS
configuration is invalid


Bryan, if you see this can you please comment?

On Tue, Dec 27, 2022 at 4:13 PM James McMahon  wrote:

> Hello. I am trying to start a secure instance of nifi version 1.16.3. I am
> getting this error on start attempt:
>
> 2022-12-27 20:44:21,765 INFO [main] o.a.n.r.v.FileBasedVariableRegistry
> Loaded a total of 90 properties.  Including precedence overrides effective
> accessible registry key size is 90
> 2022-12-27 20:44:21,972 WARN [main]
> o.a.nifi.security.util.SslContextFactory Some keystore properties are
> populated (/opt/nifi/config_resources/keys/server.jks, , ,
> JKS) but not valid
> 2022-12-27 20:44:21,972 ERROR [main]
> o.apache.nifi.controller.FlowController Unable to start the flow controller
> because the TLS configuration was invalid: The keystore properties are not
> valid
> 2022-12-27 20:44:22,009 ERROR [main] o.s.web.context.ContextLoader Context
> initialization failed
> org.springframework.beans.factory.BeanCreationException: Error creating
> bean with name
> 'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
> Initialization of bean failed; nested exception is
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': Unsatisfied
> dependency expressed through method 'setJwtAuthenticationProvider'
> parameter 0; nested exception is
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
> Unsatisfied dependency expressed through constructor parameter 3; nested
> exception is org.springframework.beans.factory.BeanCreationException: Error
> creating 

Failing to start - keystore properties invalid

2022-12-27 Thread James McMahon
Hello. I am trying to start a secure instance of nifi version 1.16.3. I am
getting this error on start attempt:

2022-12-27 20:44:21,765 INFO [main] o.a.n.r.v.FileBasedVariableRegistry
Loaded a total of 90 properties.  Including precedence overrides effective
accessible registry key size is 90
2022-12-27 20:44:21,972 WARN [main]
o.a.nifi.security.util.SslContextFactory Some keystore properties are
populated (/opt/nifi/config_resources/keys/server.jks, , ,
JKS) but not valid
2022-12-27 20:44:21,972 ERROR [main]
o.apache.nifi.controller.FlowController Unable to start the flow controller
because the TLS configuration was invalid: The keystore properties are not
valid
2022-12-27 20:44:22,009 ERROR [main] o.s.web.context.ContextLoader Context
initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name
'org.springframework.security.config.annotation.web.configuration.WebSecurityConfiguration':
Initialization of bean failed; nested exception is
org.springframework.beans.factory.UnsatisfiedDependencyException: Error
creating bean with name
'org.apache.nifi.web.NiFiWebApiSecurityConfiguration': Unsatisfied
dependency expressed through method 'setJwtAuthenticationProvider'
parameter 0; nested exception is
org.springframework.beans.factory.UnsatisfiedDependencyException: Error
creating bean with name
'org.apache.nifi.web.security.configuration.JwtAuthenticationSecurityConfiguration':
Unsatisfied dependency expressed through constructor parameter 3; nested
exception is org.springframework.beans.factory.BeanCreationException: Error
creating bean with name 'flowController': FactoryBean threw exception on
object creation; nested exception is java.lang.IllegalStateException: Flow
controller TLS configuration is invalid



This is what my nifi.properties file looks like in this section:

# security properties #
nifi.sensitive.props.key=A_KEY_HERE
nifi.sensitive.props.key.protected=
nifi.sensitive.props.algorithm=NIFI_PBKDF2_AES_GCM_256
nifi.sensitive.props.additional.keys=

nifi.security.autoreload.enabled=false
nifi.security.autoreload.interval=10 secs
nifi.security.keystore=/opt/nifi/config_resources/keys/server.jks
nifi.security.keystoreType=JKS
nifi.security.keystorePasswd=b0gu5passw0r2!
nifi.security.keyPasswd=b0gu5passw0r2!
nifi.security.truststore=/opt/nifi/config_resources/keys/truststore.jks
nifi.security.truststoreType=JKS
nifi.security.truststorePasswd=Diff3r3ntBoguspwd#
nifi.security.user.authorizer=managed-authorizer
nifi.security.allow.anonymous.authentication=false
nifi.security.user.login.identity.provider=
nifi.security.user.jws.key.rotation.period=
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=

I have verified the password for my keystore at the command line (this
works):

sudo keytool -list -v -keystore server.jks
Enter keystore password: b0gu5passw0r2!
(I see the result)

These JKS files were converted by me from a cacert.pem (to truststore.jks)
and a server.pfx (for server.jks) using keytool. The cacert.pem and the
server.pfx were created by me at TinyCert.org.

I thought my keyPasswd should be the same as my keystorePasswd, but am I
wrong about that? Is it possible that the keyPasswd is the password or
passphrase I employed when I created the original server.pfx file?

What is this error telling me, and how can I fix it?

To summarize, this is how I got to where I am:
I created a cacert.pem, an admin.pfx, server.pfx, and client1.pfx using
TinyCert.
While in TinyCert.org I was in with a password and a passphrase.
I transferred those to my keys directory under my nifi install and used
keytool to create a truststore.jks, a server.jks, a client1.jks, and an
admin.jks keystore file.
Each jks has its own password.
I can look at the contents of my truststore,jks, my admin.jks, my
server.jks, and my client1.jks using keytool, with the password I provided
to keytool for admin at the time of conversion.

Jim


Re: authorizers.xml for simple single user configuration?

2022-12-13 Thread James McMahon
Thank you Bryan. I do have that declared in login-identity-providers.xml:
[ec2-user@ip-172-31-73-197 conf]$ more login-identity-providers.xml



  

org.apache.nifi.authentication.single.user.SingleUserLoginIdentityProvider
single-user-provider
  


I am not sure I can answer your question. I thought authorizers.xml, and
the other xml conf files were required. For my simplified use case, what is
the bare minimum configuration including in the authorizers.xml and
login-identity-providers.xml?

I realize my use case is not a good long-term objective. But I want to get
a nifi instance running in a minimalist form, and then after I do that
tackle authorization, https, etc.

On Tue, Dec 13, 2022 at 1:48 PM Bryan Bende  wrote:

> The SingleUserAuthorizer requires using the
> SingleUserLoginIdentityProvider, do you have that declared in
> login-identity-providers.xml?
>
> Also if you are trying to remove authentication/authorization and run
> over http, then why declare the SingleUserAuthorizer at all?
>
> On Tue, Dec 13, 2022 at 1:43 PM James McMahon 
> wrote:
> >
> > Hello. I am having difficulty getting nifi to start for a simple single
> node configuration without user authentication. My goal is to get a nifi
> instance running over http. I understood that there would be no user
> authentication in such a case. Why then is my nifi instance failing to
> start with these errors thrown for authorizers.xml  (example of the errors
> at bottom)?
> >
> > I have no FileUserGroupProvider or LdapUserGroupProvider to access for
> user account information. I establish a user named nifi and group named
> nifi at the time I run my playbook.
> >
> > Currently I have only this in my authorizers.xml file.
> > 
> > 
> > 
> >   
> > single-user-authorizer
> >
>  org.apache.nifi.authorization.single.user.SingleUserAuthorizer
> >   
> > 
> > I run my ansible playbook as user ec2-user, and the ansible role
> establishes a user nifi. How must authorizers.xml be configured for such a
> single-node nifi configuration?
> >
> > When I attempt to start nifi I get a series of errors like these in
> nifi-app.log. I suspect my authorizers.xml is missing info.
> >
> > Caused by:
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating bean with name
> 'org.springframework.security.config.annotation.method.configuration.GlobalMethodSecurityConfiguration':
> Unsatisfied dependency expressed through method 'setObjectPostProcessor'
> parameter 0; nested exception is
> org.springframework.beans.factory.UnsatisfiedDependencyException: Error
> creating
> >  bean with name
> 'org.apache.nifi.web.security.configuration.AuthenticationSecurityConfiguration':
> Unsatisfied dependency expressed through constructor parameter 2; nested
> exception is org.springframe
> > work.beans.factory.BeanCreationException: Error creating bean with name
> 'authorizer': FactoryBean threw exception on object creation; nested
> exception is java.lang.Exception: Unable to load the authorizer
> configuration file at: /opt/nifi/releases/nifi-1.16.3/./conf/authorizers.xml
>


authorizers.xml for simple single user configuration?

2022-12-13 Thread James McMahon
Hello. I am having difficulty getting nifi to start for a simple single
node configuration without user authentication. My goal is to get a nifi
instance running over http. I understood that there would be no user
authentication in such a case. Why then is my nifi instance failing to
start with these errors thrown for authorizers.xml  (example of the errors
at bottom)?

I have no FileUserGroupProvider or LdapUserGroupProvider to access for user
account information. I establish a user named nifi and group named nifi at
the time I run my playbook.

Currently I have only this in my authorizers.xml file.



  
single-user-authorizer

org.apache.nifi.authorization.single.user.SingleUserAuthorizer
  

I run my ansible playbook as user ec2-user, and the ansible role
establishes a user nifi. How must authorizers.xml be configured for such a
single-node nifi configuration?

When I attempt to start nifi I get a series of errors like these in
nifi-app.log. I suspect my authorizers.xml is missing info.

Caused by:
org.springframework.beans.factory.UnsatisfiedDependencyException: Error
creating bean with name
'org.springframework.security.config.annotation.method.configuration.GlobalMethodSecurityConfiguration':
Unsatisfied dependency expressed through method 'setObjectPostProcessor'
parameter 0; nested exception is
org.springframework.beans.factory.UnsatisfiedDependencyException: Error
creating
 bean with name
'org.apache.nifi.web.security.configuration.AuthenticationSecurityConfiguration':
Unsatisfied dependency expressed through constructor parameter 2; nested
exception is org.springframe
work.beans.factory.BeanCreationException: Error creating bean with name
'authorizer': FactoryBean threw exception on object creation; nested
exception is java.lang.Exception: Unable to load the authorizer
configuration file at: /opt/nifi/releases/nifi-1.16.3/./conf/authorizers.xml


Re: Error on nifi start

2022-12-13 Thread James McMahon
I now try to set this at end of my bootstrap.conf file:
java.arg.snappy=-Dorg.xerial.snappy.tempdir=/usr/hdf/current/nifi/tmp

But this only throws another ERROR to the log that causes nifi to fail:
2022-12-13 15:09:49,284 ERROR [NiFi logging handler] org.apache.nifi.StdErr
java.lang.reflect.InaccessibleObjectException: Unable to make protected
final java.lang.Class java.lang.ClassLoader.defineC
lass(java.lang.String,byte[],int,int,java.security.ProtectionDomain) throws
java.lang.ClassFormatError accessible: module java.base does not "opens
java.lang" to unnamed module @2326180c
2022-12-13 15:09:49,285 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
2022-12-13 15:09:49,285 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
java.base/java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
java.base/java.lang.reflect.Method.setAccessible(Method.java:193)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.xerial.snappy.SnappyLoader.injectSnappyNativeLoader(SnappyLoader.java:275)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:227)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.xerial.snappy.Snappy.(Snappy.java:48)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.apache.nifi.processors.hive.PutHiveStreaming.(PutHiveStreaming.java:158)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at java.base/java.lang.Class.forName0(Native
Method)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
java.base/java.lang.Class.forName(Class.java:467)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.apache.nifi.nar.StandardExtensionDiscoveringManager.getClass(StandardExtensionDiscoveringManager.java:328)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.apache.nifi.documentation.DocGenerator.documentConfigurableComponent(DocGenerator.java:100)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.apache.nifi.documentation.DocGenerator.generate(DocGenerator.java:65)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at
org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1126)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at org.apache.nifi.NiFi.(NiFi.java:159)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at org.apache.nifi.NiFi.(NiFi.java:71)
2022-12-13 15:09:49,286 ERROR [NiFi logging handler]
org.apache.nifi.StdErr at org.apache.nifi.NiFi.main(NiFi.java:303)

Some research suggests this may be fixable by setting this as follows,
presumably to the java startup:
--illegal-access=permit
(see cglib - java.lang.ExceptionInInitializerError with Java-16 |
j.l.ClassFormatError accessible: module java.base does not "opens
java.lang" to unnamed module - Stack Overflow
<https://stackoverflow.com/questions/66974846/java-lang-exceptionininitializererror-with-java-16-j-l-classformaterror-access>
 )

Would I set this in bootstrap.conf? How?

On Tue, Dec 13, 2022 at 7:45 AM James McMahon  wrote:

> I am using an Ansible role from Ansible GALAXY that has been tested and
> validated up through Apache NiFi v1.14.0. I download and install 1.14.0.bin
> from the Apache NiFi archives fir this reason.
>
> I am using ansible to install on and AWS EC2 instance. My java version on
> this instance is:
>
> openjdk 17.0.5 2022-10-18 LTS
>
> OpenJDK Runtime Environment Corretto-17.0.5.8.1
>
> The install goes well. But when nifi attempts to start, it fails with the
> following error message. Is this error indicating a compatibility issue
> with the java installation on AWS? How should I proceed to get nifi to
> start?
>
> 2022-12-13 02:50:39,316 ERROR [main] org.apache.nifi.NiFi Failure to
> launch NiFi due to org.xerial.snappy.SnappyError:
> [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to make p
>
> rotected final java.lang.Class
> java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
> throws java.lang.ClassFormatError acce
>
> ssible: module java.base does not "opens java.lang" to unnamed module
> @31b289da
>
> org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to
> make protected final java.lang.Class
> java.lang.

Error on nifi start

2022-12-13 Thread James McMahon
I am using an Ansible role from Ansible GALAXY that has been tested and
validated up through Apache NiFi v1.14.0. I download and install 1.14.0.bin
from the Apache NiFi archives fir this reason.

I am using ansible to install on and AWS EC2 instance. My java version on
this instance is:

openjdk 17.0.5 2022-10-18 LTS

OpenJDK Runtime Environment Corretto-17.0.5.8.1

The install goes well. But when nifi attempts to start, it fails with the
following error message. Is this error indicating a compatibility issue
with the java installation on AWS? How should I proceed to get nifi to
start?

2022-12-13 02:50:39,316 ERROR [main] org.apache.nifi.NiFi Failure to launch
NiFi due to org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY]
Unable to make p

rotected final java.lang.Class
java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
throws java.lang.ClassFormatError acce

ssible: module java.base does not "opens java.lang" to unnamed module
@31b289da

org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] Unable to
make protected final java.lang.Class
java.lang.ClassLoader.defineClass(java.lang.String,byte[]

,int,int,java.security.ProtectionDomain) throws java.lang.ClassFormatError
accessible: module java.base does not "opens java.lang" to unnamed module
@31b289da

at
org.xerial.snappy.SnappyLoader.injectSnappyNativeLoader(SnappyLoader.java:297)

at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:227)

at org.xerial.snappy.Snappy.(Snappy.java:48)

at
org.apache.nifi.processors.hive.PutHiveStreaming.(PutHiveStreaming.java:158)

at java.base/java.lang.Class.forName0(Native Method)

at java.base/java.lang.Class.forName(Class.java:467)

at
org.apache.nifi.nar.StandardExtensionDiscoveringManager.getClass(StandardExtensionDiscoveringManager.java:328)

at
org.apache.nifi.documentation.DocGenerator.documentConfigurableComponent(DocGenerator.java:100)

at
org.apache.nifi.documentation.DocGenerator.generate(DocGenerator.java:65)

at
org.apache.nifi.web.server.JettyServer.start(JettyServer.java:1126)

at org.apache.nifi.NiFi.(NiFi.java:159)

at org.apache.nifi.NiFi.(NiFi.java:71)

at org.apache.nifi.NiFi.main(NiFi.java:303)


Re: Unable to start nifi service

2022-12-11 Thread James McMahon
I do see an ERROR in my nifi-app.log. Anyone have any experience addressing
this problem? I'm going to see what it says in the Admin Guide, as the
ERROR suggests. I'm not certain I understand how to set this sensitive
properties key. I'm assuming it has something to do with sensitive
properties in parameter contexts.

Any guidance would still be appreciated. I need to get this running. Thanks
in advance for any help.

2022-12-11 02:54:03,338 ERROR [main]
o.a.nifi.properties.NiFiPropertiesLoader Flow Configuration
[./conf/flow.json.gz] Found: Migration Required for blank Sensitive
Properties Key [nifi.sensitive.props.k
ey]
2022-12-11 02:54:03,340 ERROR [main] org.apache.nifi.NiFi Failure to launch
NiFi
java.lang.IllegalArgumentException: There was an issue decrypting protected
properties
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:375)
at
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:343)
at
org.apache.nifi.NiFi.convertArgumentsToValidatedNiFiProperties(NiFi.java:339)
at org.apache.nifi.NiFi.main(NiFi.java:331)
Caused by: org.apache.nifi.properties.SensitivePropertyProtectionException:
Sensitive Properties Key [nifi.sensitive.props.key] not found: See Admin
Guide section [Updating the Sensitive Properties Key]
at
org.apache.nifi.properties.NiFiPropertiesLoader.getDefaultProperties(NiFiPropertiesLoader.java:239)
at
org.apache.nifi.properties.NiFiPropertiesLoader.get(NiFiPropertiesLoader.java:218)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.apache.nifi.NiFi.initializeProperties(NiFi.java:370)
... 3 common frames omitted


On Sat, Dec 10, 2022 at 11:55 PM Jeremy Pemberton-Pigott <
fuzzych...@gmail.com> wrote:

> Instead of the bootstrap log check the nifi-app.log. It might be out of
> memory or have the port it needs already in use.
>
> Regards,
>
> Jeremy
>
>
> On 11 Dec 2022, at 11:01, James McMahon  wrote:
>
> 
> I am trying to start nifi on an AWS EC2 instance. My bootstrap.conf says
> the service does not start, but I see no indication why. I am trying to
> start the nifi service using
>
> sudo ../bin/nifi.sh start
>
> How can I debug this? Here is the nifi-bootstrap log:
>
> 2022-12-11 02:54:02,932 INFO [main] org.apache.nifi.bootstrap.Command
> Command: java -classpath
> /opt/nifi/releases/nifi-1.18.0/./conf:/opt/nifi/releases/nifi-1.18.0/./lib/javax.servlet-api-3.1.0.jar:/
>
> opt/nifi/releases/nifi-1.18.0/./lib/jetty-schemas-5.2.jar:/opt/nifi/releases/nifi-1.18.0/./lib/logback-classic-1.2.11.jar:/opt/nifi/releases/nifi-1.18.0/./lib/logback-core-1.2.11.jar:/opt/nifi/releas
>
> es/nifi-1.18.0/./lib/jcl-over-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/jul-to-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/log4j-over-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.
>
> 18.0/./lib/nifi-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-framework-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-server-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/
>
> nifi-runtime-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-nar-utils-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-properties-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-proper
>
> ty-utils-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-stateless-bootstrap-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-stateless-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/sl
>
> f4j-api-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jakarta.activation-api-1.2.2.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jakarta.activation-1.2.2.jar:/opt/nifi/releases/nifi-1.18.0
>
> /./lib/java11/jakarta.xml.bind-api-2.3.3.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jaxb-runtime-2.3.5.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/txw2-2.3.5.jar:/opt/nifi/releases/nifi-1.18
> .0/./lib/java11/istack-commons-runtime-3.0.12.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/javax.annotation-api-1.3.2.jar
> -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m -Dcurato
> r-log-only-first-connection-issue-as-error-level=true
> -Djavax.security.auth.useSubjectCredsOnly=true
> -Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false
> -Dsun.net.http.allowRes
> trictedHeaders=true -Djava.net.preferIPv4Stack=true
> -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol
> -Dnifi.properties.file.path=/opt/nifi/releases/nifi-1.18.0/./conf/nifi.p
> roperties -Dnifi.bootstrap.listen.

Unable to start nifi service

2022-12-10 Thread James McMahon
I am trying to start nifi on an AWS EC2 instance. My bootstrap.conf says
the service does not start, but I see no indication why. I am trying to
start the nifi service using

sudo ../bin/nifi.sh start

How can I debug this? Here is the nifi-bootstrap log:

2022-12-11 02:54:02,932 INFO [main] org.apache.nifi.bootstrap.Command
Command: java -classpath
/opt/nifi/releases/nifi-1.18.0/./conf:/opt/nifi/releases/nifi-1.18.0/./lib/javax.servlet-api-3.1.0.jar:/
opt/nifi/releases/nifi-1.18.0/./lib/jetty-schemas-5.2.jar:/opt/nifi/releases/nifi-1.18.0/./lib/logback-classic-1.2.11.jar:/opt/nifi/releases/nifi-1.18.0/./lib/logback-core-1.2.11.jar:/opt/nifi/releas
es/nifi-1.18.0/./lib/jcl-over-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/jul-to-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/log4j-over-slf4j-1.7.36.jar:/opt/nifi/releases/nifi-1.
18.0/./lib/nifi-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-framework-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-server-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/
nifi-runtime-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-nar-utils-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-properties-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-proper
ty-utils-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-stateless-bootstrap-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/nifi-stateless-api-1.18.0.jar:/opt/nifi/releases/nifi-1.18.0/./lib/sl
f4j-api-1.7.36.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jakarta.activation-api-1.2.2.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jakarta.activation-1.2.2.jar:/opt/nifi/releases/nifi-1.18.0
/./lib/java11/jakarta.xml.bind-api-2.3.3.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/jaxb-runtime-2.3.5.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/txw2-2.3.5.jar:/opt/nifi/releases/nifi-1.18
.0/./lib/java11/istack-commons-runtime-3.0.12.jar:/opt/nifi/releases/nifi-1.18.0/./lib/java11/javax.annotation-api-1.3.2.jar
-Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m -Dcurato
r-log-only-first-connection-issue-as-error-level=true
-Djavax.security.auth.useSubjectCredsOnly=true
-Djava.security.egd=file:/dev/urandom -Dzookeeper.admin.enableServer=false
-Dsun.net.http.allowRes
trictedHeaders=true -Djava.net.preferIPv4Stack=true
-Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol
-Dnifi.properties.file.path=/opt/nifi/releases/nifi-1.18.0/./conf/nifi.p
roperties -Dnifi.bootstrap.listen.port=40671 -Dapp=NiFi
-Dorg.apache.nifi.bootstrap.config.log.dir=/opt/nifi/releases/nifi-1.18.0/logs
org.apache.nifi.NiFi
2022-12-11 02:54:02,976 INFO [main] org.apache.nifi.bootstrap.Command
Launched Apache NiFi with Process ID 26353
2022-12-11 02:54:03,979 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi
never started. Will not restart NiFi
2022-12-11 02:54:10,317 INFO [main] o.a.n.b.NotificationServiceManager
Successfully loaded the following 0 services: []
2022-12-11 02:54:10,320 INFO [main] org.apache.nifi.bootstrap.RunNiFi
Registered no Notification Services for Notification Type NIFI_STARTED
2022-12-11 02:54:10,320 INFO [main] org.apache.nifi.bootstrap.RunNiFi
Registered no Notification Services for Notification Type NIFI_STOPPED
2022-12-11 02:54:10,320 INFO [main] org.apache.nifi.bootstrap.RunNiFi
Registered no Notification Services for Notification Type NIFI_DIED
2022-12-11 02:54:10,337 INFO [main] org.apache.nifi.bootstrap.Command
Apache NiFi is not running


Customizing NiFi in a Docker Container on EC2

2022-11-11 Thread James McMahon
The NiFi System Administration Guide makes many recommendations for
configuration changes to optimize nifi performance. These "best
practices", for example:
* 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices
Placement of repos on separate disk devices is another big one; here
is an example:
* 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository

I have nifi installed in a docker container on an EC2 instance.

As far as I can tell, it is not optimized and I am writing today to
ask if anyone has done that with success? If so, I'd like to learn
more about that..


I use the command
docker exec -it nifi /bin/bash
to review what I understand to be the nifi directories and config files in
the container.

I notice nifi in the container is in many ways not configured to Apache
NiFi recommendations for optimal performance. For example, the
nifi.properties repo params for the docker installation all show repos
placed on one common disk
device (the one where the container lives, presumably).

I've configured external ebs volumes that I've mounted on my instance.
My intention: one
for content_repository, one for flowfile_repository, and likewise for
database and provenance repositories. I'd like to have the containerized
nifi write to and read from those so that I don't bottleneck performance
reading and writing to the same device for repos.

I need to persist my changes to nifi config files. How does one avoid
making changes in nifi.properties and the like that are lost when the
docker container is stopped, deleted, and a new one instantiated?

I need to engage with external repo resources when nifi runs within my
container.
How do we direct nifi in the container to use those external resources
outside of the container to host content_repository, etc etc?

Thank you in advance for any help.
Jim


Re: NiFi on AWS EC2

2022-11-10 Thread James McMahon
I used the command
docker exec -it nifi /bin/bash
to review what I understand to be the nifi directories and config files in
the container.

I notice nifi in the container is in many ways not configured to Apache
NiFi recommendations for optimal performance. For example, the
nifi.properties repo params all refer to repos placed on one common disk
device (the one where the container lives, presumably).

I've configured external ebs volumes that I've mounted on my instance. One
for content_repository, one for flowfile_repository, and likewise for
database and provenance repositories. I'd like to have the containerized
nifi write to and read from those so that I don't bottleneck performance
reading and writing to the same device for repos.

I need to persist my changes to nifi config files. How does one avoid
making changes in nifi.properties and the like that are lost when the
docker container is stopped, deleted, and a new one instantiated?

I need to leverage external resources when nifi runs within my container.
How do we direct nifi in the container to use those external resources
outside of the container to host content_repository, etc etc?

Thank you in advance for any help.
Jim

On Tue, Nov 8, 2022 at 10:28 PM David Handermann <
exceptionfact...@apache.org> wrote:

> Jim,
>
> You're welcome! Thanks for following up and confirming the solution, great
> collaborative effort!
>
> Regard,
> David Handermann
>
>
>
>
> On Tue, Nov 8, 2022, 7:25 PM James McMahon  wrote:
>
>> That was it. Adding the port to the docker run command proxy got me to
>> the promised land. I was then able to use the userid and password from the
>> docker log to access nifi on my ec2 instance.
>>
>> David, Dmitry - thank you so much. This was a huge help to me, and I hope
>> it will help others trying the same approach in the future.
>> Jim
>>
>> On Tue, Nov 8, 2022 at 8:13 PM David Handermann <
>> exceptionfact...@apache.org> wrote:
>>
>>> It may also be necessary to include the port in the host variable:
>>>
>>> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=
>>> ec2-3-238-27-220.compute-1.amazonaws.com:8443 -d apache/nifi:latest
>>>
>>> It is possible to access the configuration and logs files using an
>>> interactive shell with the following Docker command:
>>>
>>> docker exec -it nifi /bin/bash
>>>
>>> Regards,
>>> David Handermann
>>>
>>> On Tue, Nov 8, 2022 at 7:09 PM Dmitry Stepanov 
>>> wrote:
>>>
>>>> Make sure you use your full domain name
>>>> ec2-3-238-27-220.compute-1.amazonaws.com
>>>> David shorten it in his code
>>>>
>>>> On November 8, 2022 5:57:26 p.m. James McMahon 
>>>> wrote:
>>>>
>>>>> Thank you, David. I’ve made that change, adding the proxy host
>>>>> specification on the docker command line. I continue to get the same error
>>>>> message. Is it possible I need to indicate my key on the docker command
>>>>> line too?
>>>>>
>>>>> Related, how can one access nifi.properties and the usual nifi config
>>>>> files, as well as the family of nifi-app.log files and bootstrap.conf, 
>>>>> when
>>>>> nifi is running inside a docker container?
>>>>>
>>>>> Thanks again for sticking with this. I feel like we’re getting closer.
>>>>> Jim
>>>>>
>>>>> On Tue, Nov 8, 2022 at 7:31 PM David Handermann <
>>>>> exceptionfact...@apache.org> wrote:
>>>>>
>>>>>> Hi Jim,
>>>>>>
>>>>>> Good adjustment on the security group inbound rules.
>>>>>>
>>>>>> The error page is the result of NiFi receiving an unexpected HTTP
>>>>>> Host header, not matching one of the expected values.
>>>>>>
>>>>>> For this to work, it is possible to pass the external DNS name as the
>>>>>> value of the NIFI_WEB_PROXY_HOST environment variable. This can be
>>>>>> specified in the docker run command as follows:
>>>>>>
>>>>>> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=ec2...
>>>>>> amazonaws.com -d apache/nifi:latest
>>>>>>
>>>>>> That will allow NiFi to accept the Host header from the browser, and
>>>>>> then present the login screen.
>>>>>>
>>>>>> Regards,
>>>>>> David Handermann
>>>>>>
>&g

Re: NiFi on AWS EC2

2022-11-08 Thread James McMahon
That was it. Adding the port to the docker run command proxy got me to the
promised land. I was then able to use the userid and password from the
docker log to access nifi on my ec2 instance.

David, Dmitry - thank you so much. This was a huge help to me, and I hope
it will help others trying the same approach in the future.
Jim

On Tue, Nov 8, 2022 at 8:13 PM David Handermann 
wrote:

> It may also be necessary to include the port in the host variable:
>
> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=
> ec2-3-238-27-220.compute-1.amazonaws.com:8443 -d apache/nifi:latest
>
> It is possible to access the configuration and logs files using an
> interactive shell with the following Docker command:
>
> docker exec -it nifi /bin/bash
>
> Regards,
> David Handermann
>
> On Tue, Nov 8, 2022 at 7:09 PM Dmitry Stepanov 
> wrote:
>
>> Make sure you use your full domain name
>> ec2-3-238-27-220.compute-1.amazonaws.com
>> David shorten it in his code
>>
>> On November 8, 2022 5:57:26 p.m. James McMahon 
>> wrote:
>>
>>> Thank you, David. I’ve made that change, adding the proxy host
>>> specification on the docker command line. I continue to get the same error
>>> message. Is it possible I need to indicate my key on the docker command
>>> line too?
>>>
>>> Related, how can one access nifi.properties and the usual nifi config
>>> files, as well as the family of nifi-app.log files and bootstrap.conf, when
>>> nifi is running inside a docker container?
>>>
>>> Thanks again for sticking with this. I feel like we’re getting closer.
>>> Jim
>>>
>>> On Tue, Nov 8, 2022 at 7:31 PM David Handermann <
>>> exceptionfact...@apache.org> wrote:
>>>
>>>> Hi Jim,
>>>>
>>>> Good adjustment on the security group inbound rules.
>>>>
>>>> The error page is the result of NiFi receiving an unexpected HTTP Host
>>>> header, not matching one of the expected values.
>>>>
>>>> For this to work, it is possible to pass the external DNS name as the
>>>> value of the NIFI_WEB_PROXY_HOST environment variable. This can be
>>>> specified in the docker run command as follows:
>>>>
>>>> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=ec2...
>>>> amazonaws.com -d apache/nifi:latest
>>>>
>>>> That will allow NiFi to accept the Host header from the browser, and
>>>> then present the login screen.
>>>>
>>>> Regards,
>>>> David Handermann
>>>>
>>>> On Tue, Nov 8, 2022 at 6:06 PM James McMahon 
>>>> wrote:
>>>>
>>>>> Hi David. This is very helpful, thank you. I feel like I am close, but
>>>>> I get an error. My Inbound Rules for my security group now include:
>>>>> 8443 TCP (MyIP)/32
>>>>> 443 TCP (MyIP)/32
>>>>> 22 TCP (MyIP)/32
>>>>>
>>>>> In my browser - I tried both Edge and Chrome - I use this
>>>>> URL:
>>>>> https://ec2-3-238-27-230.compute-1.amazonaws.com:8443
>>>>> I have also tried with /nifi at the tail end.
>>>>>
>>>>> I get this error:
>>>>>
>>>>> *System Error*
>>>>>
>>>>> *The request contained an invalid host header
>>>>> [ec2-3-238-27-220.compute-1.amazonaws.com:8443
>>>>> <http://ec2-3-238-27-220.compute-1.amazonaws.com:8443/>] in the request
>>>>> [/]. Check for request manipulation or third-party intercept.*
>>>>>
>>>>> *Valid host headers are [empty] or:*
>>>>>
>>>>>- *127.0.0.1*
>>>>>- *127.0.0.1:8443 <http://127.0.0.1:8443/>*
>>>>>- *localhost*
>>>>>- *localhost:8443*
>>>>>- *[::1]*
>>>>>- *[::1]:8443*
>>>>>- *7f661ae687d7*
>>>>>- *7f661ae687d7:8443*
>>>>>- *172.17.0.2*
>>>>>- *172.17.0.2:8443 <http://172.17.0.2:8443/>*
>>>>>
>>>>>
>>>>> Does this mean I have formed the URL incorrectly?
>>>>>
>>>>> I also see that I had to add an exception to permit https. When I
>>>>> created the instance, I created my own pem key pair. It is not signed by
>>>>> any CA. For a self-signed key pair like this, do I need to install a key 
>>>>> in
>>>>> my browser s

Re: NiFi on AWS EC2

2022-11-08 Thread James McMahon
Yes sir, I did. I used the full public domain name.

On Tue, Nov 8, 2022 at 8:08 PM Dmitry Stepanov  wrote:

> Make sure you use your full domain name
> ec2-3-238-27-220.compute-1.amazonaws.com
> David shorten it in his code
>
> On November 8, 2022 5:57:26 p.m. James McMahon 
> wrote:
>
>> Thank you, David. I’ve made that change, adding the proxy host
>> specification on the docker command line. I continue to get the same error
>> message. Is it possible I need to indicate my key on the docker command
>> line too?
>>
>> Related, how can one access nifi.properties and the usual nifi config
>> files, as well as the family of nifi-app.log files and bootstrap.conf, when
>> nifi is running inside a docker container?
>>
>> Thanks again for sticking with this. I feel like we’re getting closer.
>> Jim
>>
>> On Tue, Nov 8, 2022 at 7:31 PM David Handermann <
>> exceptionfact...@apache.org> wrote:
>>
>>> Hi Jim,
>>>
>>> Good adjustment on the security group inbound rules.
>>>
>>> The error page is the result of NiFi receiving an unexpected HTTP Host
>>> header, not matching one of the expected values.
>>>
>>> For this to work, it is possible to pass the external DNS name as the
>>> value of the NIFI_WEB_PROXY_HOST environment variable. This can be
>>> specified in the docker run command as follows:
>>>
>>> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=ec2...
>>> amazonaws.com -d apache/nifi:latest
>>>
>>> That will allow NiFi to accept the Host header from the browser, and
>>> then present the login screen.
>>>
>>> Regards,
>>> David Handermann
>>>
>>> On Tue, Nov 8, 2022 at 6:06 PM James McMahon 
>>> wrote:
>>>
>>>> Hi David. This is very helpful, thank you. I feel like I am close, but
>>>> I get an error. My Inbound Rules for my security group now include:
>>>> 8443 TCP (MyIP)/32
>>>> 443 TCP (MyIP)/32
>>>> 22 TCP (MyIP)/32
>>>>
>>>> In my browser - I tried both Edge and Chrome - I use this
>>>> URL:
>>>> https://ec2-3-238-27-230.compute-1.amazonaws.com:8443
>>>> I have also tried with /nifi at the tail end.
>>>>
>>>> I get this error:
>>>>
>>>> *System Error*
>>>>
>>>> *The request contained an invalid host header
>>>> [ec2-3-238-27-220.compute-1.amazonaws.com:8443
>>>> <http://ec2-3-238-27-220.compute-1.amazonaws.com:8443/>] in the request
>>>> [/]. Check for request manipulation or third-party intercept.*
>>>>
>>>> *Valid host headers are [empty] or:*
>>>>
>>>>- *127.0.0.1*
>>>>- *127.0.0.1:8443 <http://127.0.0.1:8443/>*
>>>>- *localhost*
>>>>- *localhost:8443*
>>>>- *[::1]*
>>>>- *[::1]:8443*
>>>>- *7f661ae687d7*
>>>>- *7f661ae687d7:8443*
>>>>- *172.17.0.2*
>>>>- *172.17.0.2:8443 <http://172.17.0.2:8443/>*
>>>>
>>>>
>>>> Does this mean I have formed the URL incorrectly?
>>>>
>>>> I also see that I had to add an exception to permit https. When I
>>>> created the instance, I created my own pem key pair. It is not signed by
>>>> any CA. For a self-signed key pair like this, do I need to install a key in
>>>> my browser security store to avoid adding that exception?
>>>>
>>>> Thank you for helping me get that much closer.
>>>> Jim
>>>>
>>>> On Tue, Nov 8, 2022 at 5:13 PM David Handermann <
>>>> exceptionfact...@apache.org> wrote:
>>>>
>>>>> Hi Jim,
>>>>>
>>>>> Thanks for the reply and additional background.
>>>>>
>>>>> The instructions are dated March 2021, which is prior to the release
>>>>> of NiFi 1.14.0. In particular, the run command is no longer accurate with
>>>>> the default NiFi container image.
>>>>>
>>>>> The current Docker Hub instructions [1] show the basic command needed
>>>>>
>>>>> docker run --name nifi -p 8443:8443 -d apache/nifi:latest
>>>>>
>>>>> In addition, any references to port 8080 in the AWS Security Group
>>>>> rules should be changed to 8443. The security group rules for port 80 and
>>>>> 18080 should be removed.
>&

Re: NiFi on AWS EC2

2022-11-08 Thread James McMahon
Thank you, David. I’ve made that change, adding the proxy host
specification on the docker command line. I continue to get the same error
message. Is it possible I need to indicate my key on the docker command
line too?

Related, how can one access nifi.properties and the usual nifi config
files, as well as the family of nifi-app.log files and bootstrap.conf, when
nifi is running inside a docker container?

Thanks again for sticking with this. I feel like we’re getting closer.
Jim

On Tue, Nov 8, 2022 at 7:31 PM David Handermann 
wrote:

> Hi Jim,
>
> Good adjustment on the security group inbound rules.
>
> The error page is the result of NiFi receiving an unexpected HTTP Host
> header, not matching one of the expected values.
>
> For this to work, it is possible to pass the external DNS name as the
> value of the NIFI_WEB_PROXY_HOST environment variable. This can be
> specified in the docker run command as follows:
>
> docker run --name nifi -p 8443:8443 -e NIFI_WEB_PROXY_HOST=ec2...
> amazonaws.com -d apache/nifi:latest
>
> That will allow NiFi to accept the Host header from the browser, and then
> present the login screen.
>
> Regards,
> David Handermann
>
> On Tue, Nov 8, 2022 at 6:06 PM James McMahon  wrote:
>
>> Hi David. This is very helpful, thank you. I feel like I am close, but I
>> get an error. My Inbound Rules for my security group now include:
>> 8443 TCP (MyIP)/32
>> 443 TCP (MyIP)/32
>> 22 TCP (MyIP)/32
>>
>> In my browser - I tried both Edge and Chrome - I use this
>> URL:
>> https://ec2-3-238-27-230.compute-1.amazonaws.com:8443
>> I have also tried with /nifi at the tail end.
>>
>> I get this error:
>>
>> *System Error*
>>
>> *The request contained an invalid host header
>> [ec2-3-238-27-220.compute-1.amazonaws.com:8443
>> <http://ec2-3-238-27-220.compute-1.amazonaws.com:8443/>] in the request
>> [/]. Check for request manipulation or third-party intercept.*
>>
>> *Valid host headers are [empty] or:*
>>
>>- *127.0.0.1*
>>- *127.0.0.1:8443 <http://127.0.0.1:8443/>*
>>- *localhost*
>>- *localhost:8443*
>>- *[::1]*
>>- *[::1]:8443*
>>- *7f661ae687d7*
>>- *7f661ae687d7:8443*
>>- *172.17.0.2*
>>- *172.17.0.2:8443 <http://172.17.0.2:8443/>*
>>
>>
>> Does this mean I have formed the URL incorrectly?
>>
>> I also see that I had to add an exception to permit https. When I created
>> the instance, I created my own pem key pair. It is not signed by any CA.
>> For a self-signed key pair like this, do I need to install a key in my
>> browser security store to avoid adding that exception?
>>
>> Thank you for helping me get that much closer.
>> Jim
>>
>> On Tue, Nov 8, 2022 at 5:13 PM David Handermann <
>> exceptionfact...@apache.org> wrote:
>>
>>> Hi Jim,
>>>
>>> Thanks for the reply and additional background.
>>>
>>> The instructions are dated March 2021, which is prior to the release of
>>> NiFi 1.14.0. In particular, the run command is no longer accurate with the
>>> default NiFi container image.
>>>
>>> The current Docker Hub instructions [1] show the basic command needed
>>>
>>> docker run --name nifi -p 8443:8443 -d apache/nifi:latest
>>>
>>> In addition, any references to port 8080 in the AWS Security Group rules
>>> should be changed to 8443. The security group rules for port 80 and 18080
>>> should be removed.
>>>
>>> The instructions that allow plain HTTP access to NiFi on port 8080
>>> should NEVER be followed, as this exposes unfiltered and unauthenticated
>>> access.
>>>
>>> Following those changes, it should be possible to access the NiFi UI
>>> using the AWS URL:
>>>
>>> https://ec2...amazonaws.com:8443
>>>
>>> The default installation will generate a username and password, which
>>> can be found in the container logs:
>>>
>>> docker logs nifi | grep Generated
>>>
>>> Regards,
>>> David Handermann
>>>
>>> [1] https://hub.docker.com/r/apache/nifi
>>>
>>> On Tue, Nov 8, 2022 at 4:00 PM James McMahon 
>>> wrote:
>>>
>>>> Hi and thank you, David and Dmitry. In my case I was following this
>>>> example,
>>>>
>>>> https://joeygoksu.com/software/apache-nifi-on-aws/
>>>>
>>>> which results in NiFi installed within a container. So to answer one of
>>>> your questions, I do

Re: NiFi on AWS EC2

2022-11-08 Thread James McMahon
Hi David. This is very helpful, thank you. I feel like I am close, but I
get an error. My Inbound Rules for my security group now include:
8443 TCP (MyIP)/32
443 TCP (MyIP)/32
22 TCP (MyIP)/32

In my browser - I tried both Edge and Chrome - I use this
URL:
https://ec2-3-238-27-230.compute-1.amazonaws.com:8443
I have also tried with /nifi at the tail end.

I get this error:

*System Error*

*The request contained an invalid host header
[ec2-3-238-27-220.compute-1.amazonaws.com:8443
<http://ec2-3-238-27-220.compute-1.amazonaws.com:8443/>] in the request
[/]. Check for request manipulation or third-party intercept.*

*Valid host headers are [empty] or:*

   - *127.0.0.1*
   - *127.0.0.1:8443 <http://127.0.0.1:8443/>*
   - *localhost*
   - *localhost:8443*
   - *[::1]*
   - *[::1]:8443*
   - *7f661ae687d7*
   - *7f661ae687d7:8443*
   - *172.17.0.2*
   - *172.17.0.2:8443 <http://172.17.0.2:8443/>*


Does this mean I have formed the URL incorrectly?

I also see that I had to add an exception to permit https. When I created
the instance, I created my own pem key pair. It is not signed by any CA.
For a self-signed key pair like this, do I need to install a key in my
browser security store to avoid adding that exception?

Thank you for helping me get that much closer.
Jim

On Tue, Nov 8, 2022 at 5:13 PM David Handermann 
wrote:

> Hi Jim,
>
> Thanks for the reply and additional background.
>
> The instructions are dated March 2021, which is prior to the release of
> NiFi 1.14.0. In particular, the run command is no longer accurate with the
> default NiFi container image.
>
> The current Docker Hub instructions [1] show the basic command needed
>
> docker run --name nifi -p 8443:8443 -d apache/nifi:latest
>
> In addition, any references to port 8080 in the AWS Security Group rules
> should be changed to 8443. The security group rules for port 80 and 18080
> should be removed.
>
> The instructions that allow plain HTTP access to NiFi on port 8080 should
> NEVER be followed, as this exposes unfiltered and unauthenticated access.
>
> Following those changes, it should be possible to access the NiFi UI using
> the AWS URL:
>
> https://ec2...amazonaws.com:8443
>
> The default installation will generate a username and password, which can
> be found in the container logs:
>
> docker logs nifi | grep Generated
>
> Regards,
> David Handermann
>
> [1] https://hub.docker.com/r/apache/nifi
>
> On Tue, Nov 8, 2022 at 4:00 PM James McMahon  wrote:
>
>> Hi and thank you, David and Dmitry. In my case I was following this
>> example,
>>
>> https://joeygoksu.com/software/apache-nifi-on-aws/
>>
>> which results in NiFi installed within a container. So to answer one of
>> your questions, I don’t yet know how or where to find nifi.properties in
>> the container framework. I don’t seem to have the usual /opt/nifi/…..
>> directories on my ec2 instance. Any idea where I need to look for that?
>>
>> These ports are open by my security group Inbound Rules: 22 to MyIP, 80,
>> 8080, and 18080 (per the link) to 0.0.0.0/0, 443 to MyIP.
>>
>> I am able to Putty into my instance as ec2-user with my ppk file, which I
>> created using putty tools from the original pem key pair. When I do putty
>> in, under /opt I find three subdirectories: aws, containerd, and rh.
>> Nothing nifi under any of the three that I can see so far.
>>
>> I start my docker instance with this command:
>> docker run —name nifi -p 18080:8080 -d apache/nifi:latest
>>
>> I can do a ps -ef and see running nifi processes. But I don’t yet know
>> how to get to the nifi logs or properties file.
>>
>> You mentioned using using localhost to get to the canvas UI. This
>> confuses me. Nifi is running on my EC2 instance - a linux host without a
>> browser. I’m in a browser on my laptop. How would localhost in my browser
>> get me to my EC2 instance running nifi?
>>
>> This is the URL I’m using in my browser:
>> http://ec2-3-238-27-220.compute-1.amazonaws.com
>> (that url changes with each Stop/Start of my instance. I’ve yet to
>> investigate how to get AWS to stop changing that IP, but I know it can be
>> done).
>>
>> The browser replies with: ec2…….amazonaws refused to connect.
>>
>> I can ping my laptop IP address from the putty terminal where I am logged
>> in to my instance. I cannot ping the Public DNS of my instance from
>> Powershell on my laptop. Again, that Public DNS is
>> ec2-3-238-27-220.compute-1.amazonaws.com
>>
>> Any help is much appreciated.
>> Jim
>>
>>
>>
>> On Tue, Nov 8, 2022 at 3:03 PM David Handermann <
>> exceptionfact...@apache.org> wrote:
&g

Re: NiFi on AWS EC2

2022-11-08 Thread James McMahon
Hi and thank you, David and Dmitry. In my case I was following this
example,

https://joeygoksu.com/software/apache-nifi-on-aws/

which results in NiFi installed within a container. So to answer one of
your questions, I don’t yet know how or where to find nifi.properties in
the container framework. I don’t seem to have the usual /opt/nifi/…..
directories on my ec2 instance. Any idea where I need to look for that?

These ports are open by my security group Inbound Rules: 22 to MyIP, 80,
8080, and 18080 (per the link) to 0.0.0.0/0, 443 to MyIP.

I am able to Putty into my instance as ec2-user with my ppk file, which I
created using putty tools from the original pem key pair. When I do putty
in, under /opt I find three subdirectories: aws, containerd, and rh.
Nothing nifi under any of the three that I can see so far.

I start my docker instance with this command:
docker run —name nifi -p 18080:8080 -d apache/nifi:latest

I can do a ps -ef and see running nifi processes. But I don’t yet know how
to get to the nifi logs or properties file.

You mentioned using using localhost to get to the canvas UI. This confuses
me. Nifi is running on my EC2 instance - a linux host without a browser.
I’m in a browser on my laptop. How would localhost in my browser get me to
my EC2 instance running nifi?

This is the URL I’m using in my browser:
http://ec2-3-238-27-220.compute-1.amazonaws.com
(that url changes with each Stop/Start of my instance. I’ve yet to
investigate how to get AWS to stop changing that IP, but I know it can be
done).

The browser replies with: ec2…….amazonaws refused to connect.

I can ping my laptop IP address from the putty terminal where I am logged
in to my instance. I cannot ping the Public DNS of my instance from
Powershell on my laptop. Again, that Public DNS is
ec2-3-238-27-220.compute-1.amazonaws.com

Any help is much appreciated.
Jim



On Tue, Nov 8, 2022 at 3:03 PM David Handermann 
wrote:

> Hi Jim,
>
> NiFi 1.14.0 and following default to HTTPS on port 8443, listening on the
> localhost address. The nifi.web.https.host can be changed to blank in order
> to listen on all interfaces, but the default HTTPS setting with
> authenticated required should be retained.
>
> Can you provide the version of NiFi and some additional details on the
> nifi.web values from nifi.properties?
>
> Regards,
> David Handermann
>
> On Tue, Nov 8, 2022 at 1:54 PM James McMahon  wrote:
>
>> Has anyone successfully configured NiFi on AWS, and accessed it from a
>> browser on a Windows desktop? I’ve tried following a few links to do this.
>> I’ve verified that my instance security group allows access to 8080 via its
>> inbound rules. I’ve putty’ed into the instance via ssh port 22 to verify
>> that there are no firewall restrictions. But still I get a message to the
>> effect that the server rejected the connection request. Can anyone
>> recommend a link that describes a success path for this?
>> Thanks in advance for your help.
>> Jim
>>
>


NiFi on AWS EC2

2022-11-08 Thread James McMahon
Has anyone successfully configured NiFi on AWS, and accessed it from a
browser on a Windows desktop? I’ve tried following a few links to do this.
I’ve verified that my instance security group allows access to 8080 via its
inbound rules. I’ve putty’ed into the instance via ssh port 22 to verify
that there are no firewall restrictions. But still I get a message to the
effect that the server rejected the connection request. Can anyone
recommend a link that describes a success path for this?
Thanks in advance for your help.
Jim


system-diagnostics for cluster or node?

2022-10-20 Thread James McMahon
When I am on a node that is part of a nifi cluster configuration and I
issue this REST API call from a browser...

https://1.2.3.4:8443/nifi-api/system-diagnostics

...is there a parameter that can be applied on that call to tell nifi I
want aggregated statistics for the cluster? And is that the default I get
with no additional parameters on the call?

In some cases I want diagnostics for the single node, and in some cases I
want them for the cluster. On the Summary page there is a link for "single
node" or cluster. How do I toggle on the REST call?

Thank you in advance.
Jim


Change replication in the cluster

2022-10-05 Thread James McMahon
We have been experiencing occasional failures to restart nifi services on a
cluster node because the flow.xml.gz falls out of synch with the other
nodes of the cluster. While researching this, found an article that
discusses possible causes. That article mentions this:

"A change replication request was made to all nodes in the cluster. One or
more nodes failed to process that request in the configured allowable
nifi.cluster.node.connection.timeout and/or nifi.cluster.node.read.timeout."

My question: is that change replication request exclusively node-to-node
communication, or does our external zookeeper play any role in that?

Thank you.


Re: Expression language to handle a capture group

2022-10-04 Thread James McMahon
I continue to dig in an effort to try and get this to work. Found this,
which made me hopeful:
https://stackoverflow.com/questions/62512560/how-to-use-regex-capturing-group-variable-in-nifi-expression-language
Essentially it says to wrap single quotes around the capture group
reference, which would get me to this:

filename ${filename:replaceFirst('([0-9]{14})',
'*'$1'*:toDate("MMddHHmmss","UTC"):toNumber()'}


It does not like this either:
filename ${filename:replaceFirst('([0-9]{14})',
'*${'$1'*:toDate("MMddHHmmss","UTC"):toNumber()}'}


Unfortunately it continues to choke. I don't think it likes the single
quotes in the second expression of the replaceFirst('', '') being
followed immediately by the single quote at the front of $1.

Jim

On Tue, Oct 4, 2022 at 11:19 AM James McMahon  wrote:

> I have an incoming filename that includes the pattern
> prefix20221004035958postfix. I need to convert that yMdHms to seconds since
> the epic within that filename.
>
> This is the expression I attempt to use, but it seems to choke on the
> capture group reference:
> filename ${filename:replaceFirst('([0-9]{14})', '*$1*
> :toDate("MMddHHmmss","UTC"):toNumber()'}
>
> How do I get this to work, applying expression language functions to the
> capture group?
> Thanks.
>


Expression language to handle a capture group

2022-10-04 Thread James McMahon
I have an incoming filename that includes the pattern
prefix20221004035958postfix. I need to convert that yMdHms to seconds since
the epic within that filename.

This is the expression I attempt to use, but it seems to choke on the
capture group reference:
filename ${filename:replaceFirst('([0-9]{14})', '*$1*
:toDate("MMddHHmmss","UTC"):toNumber()'}

How do I get this to work, applying expression language functions to the
capture group?
Thanks.


Is result milliseconds since epoch?

2022-10-03 Thread James McMahon
I have a string representation of a datetime parsed from a lengthy
filename, attribute myValue with value of 20221003055959. If I convert to a
true datetime value and then apply toNumber() to that, does toNumber()
return milliseconds since the epoch? The expression language guide doesn’t
go into too much detail.
Thank you.


Re: Can ExecuteStreamCommand do this?

2022-09-30 Thread James McMahon
Mike, let me make sure I understand this. Gzip outputs gz files that have
some reasonable level of compression. Because NiFi natively handles gzip
compressed files - presumably .gz extensions and some associated mime.type
- that is good enough for your purposes. You avoid 7za compression because
NiFi doesn't handle such compressed files natively, and because the gain in
compression is of little utility when S3 storage comes so cheaply; gzip
results are good enough.
Is that the gist of it?

On Fri, Sep 30, 2022 at 8:27 AM Mike Thomsen  wrote:

> I don't know what your use case is, but we avoid anything beyond gzip
> because S3 is so cheap.
>
> On Thu, Sep 29, 2022 at 10:51 AM James McMahon 
> wrote:
> >
> > Thank you Mark. Had no idea there was this file-based dependency to 7z
> files. Since my workaround appears to be working I think I may just move
> forward with that.
> > Steve, Mark - thank you again for replying.
> > Jim
> >
> > On Thu, Sep 29, 2022 at 9:15 AM Mark Payne  wrote:
> >>
> >> It’s been a while. But if I remember correctly, the reason that NiFi
> does not natively support 7-zip format is that with 7-zip, the dictionary
> is written at the end of the file.
> >> So when data is compressed, the dictionary is built up during
> compression and written at the end. This makes sense from a compression
> standpoint.
> >> However, what it means, is that in order to decompress it, you must
> first jump to the end of the file in order to access the dictionary. Then
> jump back to the beginning of the file in order to perform the
> decompression.
> >> NiFi makes use of Input Streams and Output Streams for FlowFIle access
> - it doesn’t provide a File-based approach. And this ability to jump to the
> end, read the dictionary, and then jump back to the beginning isn’t really
> possible with Input/Output Streams - at least, not without buffering
> everything into memory.
> >>
> >> So it would make sense that there would be a “Not Implemented” error
> when attempting to do the same thing using the 7-zip application directly,
> when attempting to use input streams & output streams.
> >> I think that if you’re stuck with 7-zip, your own option will be to do
> what you’re doing - write the data out as a file, run the 7-zip application
> against that file, writing the output to some directory, and then picking
> up the files from that directory.
> >> The alternative, of course, would be to update the source so that it’s
> creating zip files instead of 7-zip files, if you have sway over the source
> producer.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >> On Sep 29, 2022, at 8:58 AM, stephen.hindmarch.bt.com via users <
> users@nifi.apache.org> wrote:
> >>
> >> James,
> >>
> >> E_NOTIMPL means that feature is not implemented. I can see there is
> discussion about this down at sourceforge but the detail is blocked by my
> employer’s firewall.
> >>
> >> p7zip / Discussion / Help: E_NOTIMPL for stdin / stdout pipe
> >>
> >> https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d
> >>
> >> Steve Hindmarch
> >>
> >> From: James McMahon 
> >> Sent: 29 September 2022 12:12
> >> To: Hindmarch,SJ,Stephen,VIR R 
> >> Cc: users@nifi.apache.org
> >> Subject: Re: Can ExecuteStreamCommand do this?
> >>
> >> I ran with these Command Arguments in the ExecuteStreamCommand
> configuration:
> >> x;-si;-so;-spf;-aou
> >> ${filename} removed, -si indicating use of STDIN, -so STDOUT.
> >>
> >> The same error is thrown by 7z through ExecuteStreamCommand: Executable
> command /bin/7za ended in an error: ERROR: Can not open the file as an
> archive  E_NOTIMPL
> >>
> >> I tried this at the command line, getting the same failure:
> >> cat testArchive.7z | 7za x -si -so | dd of=stooges.txt
> >>
> >>
> >> On Thu, Sep 29, 2022 at 6:44 AM James McMahon 
> wrote:
> >>
> >> Good morning, Steve. Indeed, that second paragraph is exactly how I did
> get this to work. I unpack to disk and then read in the twelve results
> using a GetFile. So far it is working well. It just feels a little wrong to
> me to do this, as I have introduced an extra write to and read from disk,
> which is going to be slower than doing it all in memory within the JVM.
> While that may not seem like anything significant for a single 7z file, as
> we work across thousands and thousands it can be significant.
> >>
> >> I am about to try what you suggested above: dropping the ${filename}
> e

Re: Can ExecuteStreamCommand do this?

2022-09-29 Thread James McMahon
Thank you Mark. Had no idea there was this file-based dependency to 7z
files. Since my workaround appears to be working I think I may just move
forward with that.
Steve, Mark - thank you again for replying.
Jim

On Thu, Sep 29, 2022 at 9:15 AM Mark Payne  wrote:

> It’s been a while. But if I remember correctly, the reason that NiFi does
> not natively support 7-zip format is that with 7-zip, the dictionary is
> written at the end of the file.
> So when data is compressed, the dictionary is built up during compression
> and written at the end. This makes sense from a compression standpoint.
> However, what it means, is that in order to decompress it, you must first
> jump to the end of the file in order to access the dictionary. Then jump
> back to the beginning of the file in order to perform the decompression.
> NiFi makes use of Input Streams and Output Streams for FlowFIle access -
> it doesn’t provide a File-based approach. And this ability to jump to the
> end, read the dictionary, and then jump back to the beginning isn’t really
> possible with Input/Output Streams - at least, not without buffering
> everything into memory.
>
> So it would make sense that there would be a “Not Implemented” error when
> attempting to do the same thing using the 7-zip application directly, when
> attempting to use input streams & output streams.
> I think that if you’re stuck with 7-zip, your own option will be to do
> what you’re doing - write the data out as a file, run the 7-zip application
> against that file, writing the output to some directory, and then picking
> up the files from that directory.
> The alternative, of course, would be to update the source so that it’s
> creating zip files instead of 7-zip files, if you have sway over the source
> producer.
>
> Thanks
> -Mark
>
>
> On Sep 29, 2022, at 8:58 AM, stephen.hindmarch.bt.com via users <
> users@nifi.apache.org> wrote:
>
> James,
>
> E_NOTIMPL means that feature is not implemented. I can see there is
> discussion about this down at sourceforge but the detail is blocked by my
> employer’s firewall.
>
> p7zip / Discussion / Help: E_NOTIMPL for stdin / stdout pipe
> <https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d/>
>
> https://sourceforge.net/p/p7zip/discussion/383044/thread/8066736d
>
> *Steve Hindmarch*
>
> *From:* James McMahon 
> *Sent:* 29 September 2022 12:12
> *To:* Hindmarch,SJ,Stephen,VIR R 
> *Cc:* users@nifi.apache.org
> *Subject:* Re: Can ExecuteStreamCommand do this?
>
> I ran with these Command Arguments in the ExecuteStreamCommand
> configuration:
> x;-si;-so;-spf;-aou
> ${filename} removed, -si indicating use of STDIN, -so STDOUT.
>
> The same error is thrown by 7z through ExecuteStreamCommand: Executable
> command /bin/7za ended in an error: ERROR: Can not open the file as an
> archive  E_NOTIMPL
>
> I tried this at the command line, getting the same failure:
> cat testArchive.7z | 7za x -si -so | dd of=stooges.txt
>
>
> On Thu, Sep 29, 2022 at 6:44 AM James McMahon 
> wrote:
>
> Good morning, Steve. Indeed, that second paragraph is *exactly* how I did
> get this to work. I unpack to disk and then read in the twelve results
> using a GetFile. So far it is working well. It just feels a little wrong to
> me to do this, as I have introduced an extra write to and read from disk,
> which is going to be slower than doing it all in memory within the JVM.
> While that may not seem like anything significant for a single 7z file, as
> we work across thousands and thousands it can be significant.
>
> I am about to try what you suggested above: dropping the ${filename}
> entirely from the STDIN / STDOUT configuration. I realize it is not likely
> going to give me the twelve output flowfiles I'm seeking in the "output
> stream" path from ExecuteStreamCommand. I just want to see if it works
> without throwing that error.
>
> Welcome any other thoughts or comments you may have. Thanks again for your
> comments so far.
>
> Jim
>
> On Thu, Sep 29, 2022 at 5:23 AM  wrote:
>
> James,
>
> I have been thinking more about your problem and this may be the wrong
> approach. If you successfully unpack your files into the flow file content,
> you will still have one output flow file containing the unpacked contents
> of all of your files. If you need 12 separate files in their own flowfiles
> then you will need to find some way of splitting them up. Is there a byte
> sequence you can use in a SplitContent process, or a specific file length
> you can use in SplitText?
>
> Otherwise you may be better off using ExecuteStreamCommand to unpack the
> files on disk. Run it verbosely and use the output of that step to create a
>

Re: Can ExecuteStreamCommand do this?

2022-09-29 Thread James McMahon
I ran with these Command Arguments in the ExecuteStreamCommand
configuration:
x;-si;-so;-spf;-aou
${filename} removed, -si indicating use of STDIN, -so STDOUT.

The same error is thrown by 7z through ExecuteStreamCommand: Executable
command /bin/7za ended in an error: ERROR: Can not open the file as an
archive  E_NOTIMPL

I tried this at the command line, getting the same failure:
cat testArchive.7z | 7za x -si -so | dd of=stooges.txt


On Thu, Sep 29, 2022 at 6:44 AM James McMahon  wrote:

> Good morning, Steve. Indeed, that second paragraph is *exactly* how I did
> get this to work. I unpack to disk and then read in the twelve results
> using a GetFile. So far it is working well. It just feels a little wrong to
> me to do this, as I have introduced an extra write to and read from disk,
> which is going to be slower than doing it all in memory within the JVM.
> While that may not seem like anything significant for a single 7z file, as
> we work across thousands and thousands it can be significant.
>
> I am about to try what you suggested above: dropping the ${filename}
> entirely from the STDIN / STDOUT configuration. I realize it is not likely
> going to give me the twelve output flowfiles I'm seeking in the "output
> stream" path from ExecuteStreamCommand. I just want to see if it works
> without throwing that error.
>
> Welcome any other thoughts or comments you may have. Thanks again for your
> comments so far.
>
> Jim
>
> On Thu, Sep 29, 2022 at 5:23 AM  wrote:
>
>> James,
>>
>>
>>
>> I have been thinking more about your problem and this may be the wrong
>> approach. If you successfully unpack your files into the flow file content,
>> you will still have one output flow file containing the unpacked contents
>> of all of your files. If you need 12 separate files in their own flowfiles
>> then you will need to find some way of splitting them up. Is there a byte
>> sequence you can use in a SplitContent process, or a specific file length
>> you can use in SplitText?
>>
>>
>>
>> Otherwise you may be better off using ExecuteStreamCommand to unpack the
>> files on disk. Run it verbosely and use the output of that step to create a
>> list of the locations where your recently unpacked files are. Or create a
>> temporary directory to unpack in and fetch all the files in there, cleaning
>> up aftwerwards. Then you can load the files with FetchFile. FetchFile can
>> be instructed to delete the file it has just read so can also clean up
>> after itself.
>>
>>
>>
>> *Steve Hindmarch*
>>
>>
>>
>> *From:* stephen.hindmarch.bt.com via users 
>> *Sent:* 29 September 2022 09:19
>> *To:* jsmcmah...@gmail.com; users@nifi.apache.org
>> *Subject:* RE: Can ExecuteStreamCommand do this?
>>
>>
>>
>> James,
>>
>>
>>
>> Using ${filename} and -si together seems wrong to me. What happens when
>> you try that on the command line?
>>
>>
>>
>> *Steve Hindmarch*
>>
>>
>>
>> *From:* James McMahon 
>> *Sent:* 28 September 2022 13:49
>> *To:* users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R <
>> stephen.hindma...@bt.com>
>> *Subject:* Re: Can ExecuteStreamCommand do this?
>>
>>
>>
>> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files
>> into the flow . When I have my ESC configured like this following, I get my
>> unpacked files results to the #{unpacked.destination} directory on disk:
>>
>> Command Arguments
>> x;${filename};-spf;-o#{unpacked.destination};-aou
>>
>> Command Path/bin/7a
>>
>> Ignore STDIN   true
>>
>> Working Directory#{unpacked.destination}
>>
>> Argument Delimiter   ;
>>
>> Output Destination Attribute  No value set
>>
>> I get twelve files in my output destination folder.
>>
>>
>>
>> When I try this one, get an error and no output:
>>
>> Command Argumentsx;${filename};-si;-so;-spf;-aou
>>
>> Command Path/bin/7a
>>
>> Ignore STDIN   false
>>
>> Working Directory#{unpacked.destination}
>>
>> Argument Delimiter   ;
>>
>> Output Destination Attribute  No value set
>>
>>
>>
>> This yields this error...
>>
>> Executable command /bin/7za ended in an error: ERROR: Can not open the
>> file as archive
>>
>> E_NOTIMPL
>>
>> ...and it yields only one flowfile result in Output St

Re: Can ExecuteStreamCommand do this?

2022-09-29 Thread James McMahon
Good morning, Steve. Indeed, that second paragraph is *exactly* how I did
get this to work. I unpack to disk and then read in the twelve results
using a GetFile. So far it is working well. It just feels a little wrong to
me to do this, as I have introduced an extra write to and read from disk,
which is going to be slower than doing it all in memory within the JVM.
While that may not seem like anything significant for a single 7z file, as
we work across thousands and thousands it can be significant.

I am about to try what you suggested above: dropping the ${filename}
entirely from the STDIN / STDOUT configuration. I realize it is not likely
going to give me the twelve output flowfiles I'm seeking in the "output
stream" path from ExecuteStreamCommand. I just want to see if it works
without throwing that error.

Welcome any other thoughts or comments you may have. Thanks again for your
comments so far.

Jim

On Thu, Sep 29, 2022 at 5:23 AM  wrote:

> James,
>
>
>
> I have been thinking more about your problem and this may be the wrong
> approach. If you successfully unpack your files into the flow file content,
> you will still have one output flow file containing the unpacked contents
> of all of your files. If you need 12 separate files in their own flowfiles
> then you will need to find some way of splitting them up. Is there a byte
> sequence you can use in a SplitContent process, or a specific file length
> you can use in SplitText?
>
>
>
> Otherwise you may be better off using ExecuteStreamCommand to unpack the
> files on disk. Run it verbosely and use the output of that step to create a
> list of the locations where your recently unpacked files are. Or create a
> temporary directory to unpack in and fetch all the files in there, cleaning
> up aftwerwards. Then you can load the files with FetchFile. FetchFile can
> be instructed to delete the file it has just read so can also clean up
> after itself.
>
>
>
> *Steve Hindmarch*
>
>
>
> *From:* stephen.hindmarch.bt.com via users 
> *Sent:* 29 September 2022 09:19
> *To:* jsmcmah...@gmail.com; users@nifi.apache.org
> *Subject:* RE: Can ExecuteStreamCommand do this?
>
>
>
> James,
>
>
>
> Using ${filename} and -si together seems wrong to me. What happens when
> you try that on the command line?
>
>
>
> *Steve Hindmarch*
>
>
>
> *From:* James McMahon 
> *Sent:* 28 September 2022 13:49
> *To:* users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R <
> stephen.hindma...@bt.com>
> *Subject:* Re: Can ExecuteStreamCommand do this?
>
>
>
> Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files
> into the flow . When I have my ESC configured like this following, I get my
> unpacked files results to the #{unpacked.destination} directory on disk:
>
> Command Arguments
> x;${filename};-spf;-o#{unpacked.destination};-aou
>
> Command Path/bin/7a
>
> Ignore STDIN   true
>
> Working Directory#{unpacked.destination}
>
> Argument Delimiter   ;
>
> Output Destination Attribute  No value set
>
> I get twelve files in my output destination folder.
>
>
>
> When I try this one, get an error and no output:
>
> Command Argumentsx;${filename};-si;-so;-spf;-aou
>
> Command Path/bin/7a
>
> Ignore STDIN   false
>
> Working Directory#{unpacked.destination}
>
> Argument Delimiter   ;
>
> Output Destination Attribute  No value set
>
>
>
> This yields this error...
>
> Executable command /bin/7za ended in an error: ERROR: Can not open the
> file as archive
>
> E_NOTIMPL
>
> ...and it yields only one flowfile result in Output Stream, and that is a
> brief text/plain report of the results of the 7za extraction like this:
>
>
>
> This indicates it did indeed find my 7z file and it did indeed identify
> the 12 files in it, yet still I get no output to my outgoing flow path:
>
> Extracting archive: /parent/subparent/testArchive.7z
>
> - -
>
> Path = /parentdir/subdir/testArchive.7z
>
> Type = 7z
>
> Physical Size = 7204
>
> Headers Size = 298
>
> Method = LZMA2:96k
>
> Solid = +
>
> Blocks = 1
>
>
>
> Everything is Ok
>
>
>
> Folders: 1
>
> Files: 12
>
> Size: 90238
>
> Compressed: 7204
>
>
>
> ${filename} in both cases is a fully qualified name to the file, like
> this: /dir/subdir/myTestFile.7z.
>
>
>
> I can't seem to get the ESC output stream to be the extracted files.
> Anything jump out at you?
>
>
>
> On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com
>

Re: Can ExecuteStreamCommand do this?

2022-09-28 Thread James McMahon
Thank you Steve. I 've employed a ListFile/FetchFile to load the 7z files
into the flow . When I have my ESC configured like this following, I get my
unpacked files results to the #{unpacked.destination} directory on disk:
Command Arguments
x;${filename};-spf;-o#{unpacked.destination};-aou
Command Path/bin/7a
Ignore STDIN   true
Working Directory#{unpacked.destination}
Argument Delimiter   ;
Output Destination Attribute  No value set
I get twelve files in my output destination folder.

When I try this one, get an error and no output:
Command Argumentsx;${filename};-si;-so;-spf;-aou
Command Path/bin/7a
Ignore STDIN   false
Working Directory#{unpacked.destination}
Argument Delimiter   ;
Output Destination Attribute  No value set

This yields this error...
Executable command /bin/7za ended in an error: ERROR: Can not open the file
as archive
E_NOTIMPL
...and it yields only one flowfile result in Output Stream, and that is a
brief text/plain report of the results of the 7za extraction like this:

This indicates it did indeed find my 7z file and it did indeed identify the
12 files in it, yet still I get no output to my outgoing flow path:
Extracting archive: /parent/subparent/testArchive.7z
- -
Path = /parentdir/subdir/testArchive.7z
Type = 7z
Physical Size = 7204
Headers Size = 298
Method = LZMA2:96k
Solid = +
Blocks = 1

Everything is Ok

Folders: 1
Files: 12
Size: 90238
Compressed: 7204

${filename} in both cases is a fully qualified name to the file, like this:
/dir/subdir/myTestFile.7z.

I can't seem to get the ESC output stream to be the extracted files.
Anything jump out at you?

On Wed, Sep 28, 2022 at 8:06 AM stephen.hindmarch.bt.com via users <
users@nifi.apache.org> wrote:

> Hi James,
>
>
>
> I am not in a position to test this right now, but you have to think of
> the flowfile content as STDIN and STDOUT. So with 7zip you need to use the
> “-si” and “-so” flags to ensure there are no files involved. Then if you
> can load the content of a file into a flowfile, eg with GetFile, then you
> should be able to unpack it with ExecuteStreamCommand. Set “Ignore STDIN” =
> “false”.
>
>
>
> I have written up my own use case on github. This involves having a Redis
> script as the input, and results of the script as the output.
>
>
>
> my-nifi-cluster/experiment-redis_direct.md at main ·
> hindmasj/my-nifi-cluster · GitHub
> <https://github.com/hindmasj/my-nifi-cluster/blob/main/docs/experiment-redis_direct.md>
>
>
>
> The first part of the post shows how to do it with the input commands on
> the command line, so a bit like you running “7za ${filename} -so”. The
> second part has the script inside the flowfile and is treated as STDIN, a
> bit like you doing “unzip -si -so”.
>
>
>
> See if that helps. Fundamentally, if you do “7za -si -so < myfile.7z” on
> the command line and see the output on the console, ExecuteStreamCommand
> will behave the same.
>
>
>
> *Steve Hindmarch*
>
> *From:* James McMahon 
> *Sent:* 28 September 2022 12:02
> *To:* users@nifi.apache.org
> *Subject:* Can ExecuteStreamCommand do this?
>
>
>
> I continue to struggle with ExecuteStreamCommand, and am hoping one of you
> from our user community can help me with the following:
>
> 1. Can ExecuteStreamCommand be used as I am trying to use it?
>
> 2. Can you direct me to an example where ExecuteStreamCommand is
> configured to do something similar to my use case?
>
>
>
> My use case:
>
> The incoming flowfiles in my flow path are 7z zips. Based on what I've
> researched so far, NiFi's native processors don't handle unpacking of 7z
> files.
>
>
>
> I want to read the 7z files as STDIN to ExecuteStreamCommand.
>
> I'd like the processor to call out to a 7za app, which will unpack the 7z.
>
> One incoming flowfile will yield multiple output files. Let's say twelve
> in this case.
>
> My goal is to output those twelve as new flowfiles out of
> ExecuteStreamCommand, to its output stream path.
>
>
>
> I can't yet get this to work. Best I've been able to do is configure
> ExecuteStreamCommand to unpack ${filename} to a temporary output directory
> on disk. Then I have another path in my flow polling that directory every
> few minutes looking for new data. Am hoping to eliminate that intermediate
> write/read to/from disk by keeping this all within the flow and JVM memory.
>
>
>
> Thanks very much in advance for any assistance.
>


  1   2   3   4   5   >