Re: how to copy root fields in Jolt transform?

2018-09-04 Thread Andy LoPresto
There are some slight changes with NiFi’s Jolt compared to the test site — NiFi 
is automatically applying a chain operation but you can change that in the 
processor configuration. I’d also recommend using the Custom View on the JTJ 
processor to have a better interface for applying your transform.

The code you’re using to iterate over the flowfile attributes should work, but 
you can optimize/Groovy-fy it. Why not just write to the log for debugging 
rather than to a remote file? Also, this will overwrite the same file every 
time a flowfile comes through this script.


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 4, 2018, at 4:05 PM, l vic  wrote:
> 
> Thank you, works on the Jolt demo site. With real Nifi it doesn't. I am 
> trying to print those values in ExecuteScript with Groovy script iterating 
> flowfile attributets:
> import org.apache.commons.io.IOUtils
> import java.nio.charset.StandardCharsets
> import org.apache.nifi.processor.io.StreamCallback
> 
> def flowFile = session.get()
> def file1 = new File('/home/me/groovy/attributes.txt')
> file1.write '##\n'
> flowFile.getAttributes().each { key,value ->
>   file1.append(key+' :')
>   file1.append(value + '\n')
> }
> session.transfer(flowFile,REL_SUCCESS)
> None of them show up. Is there some way to read them from ExecuteScript?
> 
> On Tue, Sep 4, 2018 at 12:56 PM Matt Burgess  > wrote:
> Add the following to the end of your shift spec:
> 
> "*": "&"
> 
> This (at the root) matches any key not already matched and puts it in
> the output at the same location.
> 
> Regards,
> Matt
> 
> On Tue, Sep 4, 2018 at 11:39 AM l vic  > wrote:
> >
> > Hi,
> > I want to "flatten" the following object:
> > {
> > "id": 0,
> > "name": "Root",
> >  "mylist": [{
> >   "id": 10,
> >   "info": "2am-3am"
> > },
> > {
> > "id": 11,
> > "info": "3AM-4AM"
> > },
> > {
> > "id": 12,
> > "info": "4am-5am"
> > }]
> > }
> > I figured how to "flatten" array, but "root" values are gone in transformed 
> > output:
> > {
> >"mylist-0-id":10,
> >" mylist-0-info":"2am-3am",
> > 
> > }
> > How can I retain "root" fields so that transformed output would like like 
> > one below?
> > {
> >  "id": 0,
> > "name": "Root",
> >  "mylist-0-id":10,
> >" mylist-0-info":"2am-3am",
> > 
> > }
> >
> >



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: how to copy root fields in Jolt transform?

2018-09-04 Thread l vic
Thank you, works on the Jolt demo site. With real Nifi it doesn't. I am
trying to print those values in ExecuteScript with Groovy script iterating
flowfile attributets:
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
import org.apache.nifi.processor.io.StreamCallback

def flowFile = session.get()
def file1 = new File('/home/me/groovy/attributes.txt')
file1.write '##\n'
flowFile.getAttributes().each { key,value ->
  file1.append(key+' :')
  file1.append(value + '\n')
}
session.transfer(flowFile,REL_SUCCESS)
None of them show up. Is there some way to read them from ExecuteScript?

On Tue, Sep 4, 2018 at 12:56 PM Matt Burgess  wrote:

> Add the following to the end of your shift spec:
>
> "*": "&"
>
> This (at the root) matches any key not already matched and puts it in
> the output at the same location.
>
> Regards,
> Matt
>
> On Tue, Sep 4, 2018 at 11:39 AM l vic  wrote:
> >
> > Hi,
> > I want to "flatten" the following object:
> > {
> > "id": 0,
> > "name": "Root",
> >  "mylist": [{
> >   "id": 10,
> >   "info": "2am-3am"
> > },
> > {
> > "id": 11,
> > "info": "3AM-4AM"
> > },
> > {
> > "id": 12,
> > "info": "4am-5am"
> > }]
> > }
> > I figured how to "flatten" array, but "root" values are gone in
> transformed output:
> > {
> >"mylist-0-id":10,
> >" mylist-0-info":"2am-3am",
> > 
> > }
> > How can I retain "root" fields so that transformed output would like
> like one below?
> > {
> >  "id": 0,
> > "name": "Root",
> >  "mylist-0-id":10,
> >" mylist-0-info":"2am-3am",
> > 
> > }
> >
> >
>


Jolt doesn't work?

2018-09-04 Thread l vic
I have a json record that contains array "mylst":
{
"id": 0,
"name": "Root",
 "mylist": [{
  "id": 10,
  "info": "2am-3am"
},
{
"id": 11,
"info": "3AM-4AM"
}
   ]
}
I tested Jolt  spec:

[



  {

"operation": "shift",

"spec": {

  "mylist": {

"*": {

  "id": "mylist-&1-id",

  "info": "mylist-&1-info"

}

  },

  "*": "&"

}

  }

]
to transform it into "flat" version:
{
 "id": 0,
"name": "Root",
 "mylist-0-id":10,
   " mylist-0-info":"2am-3am",
   "mylist-1-id": 11,
   "mylist-1-info": "3AM-4AM"
}
However, when I put it all together to print into the file, only "root"
properties are printed:
GetFile->JoltTransformJson
I have the following exception in Jolt:
java.lang.NullPointerException: null
at java.util.ArrayList.addAll(ArrayList.java:581)
at com.bazaarvoice.jolt.defaultr.Key.applyChildren(Key.java:158)
at com.bazaarvoice.jolt.Defaultr.transform(Defaultr.java:242)
at
org.apache.nifi.processors.standard.util.jolt.TransformUtils.transform(TransformUtils.java:30)
at
org.apache.nifi.processors.standard.JoltTransformJSON.onTrigger(JoltTransformJSON.java:277)
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-04 16:10:55,008 INFO [Flow Service Tasks Thread-1]
o.a.nifi.controller.StandardFlowService Saved flow controller
org.apache.nifi.controller.FlowController@4dd21cb1 // Another save pending
= false

Any idea what I do wrong?


Re: Wrapping a JSON string

2018-09-04 Thread Steve Champagne
I was hoping to use the record processors otherwise I would definitely use
jolt. I was eventually able to get what I was after using a ScriptedReader.
I do like 'payload' better than 'json_string' though. :)

Thanks,
Steve

On Mon, Sep 3, 2018 at 3:09 AM DEHAY Aurelien 
wrote:

> Hello.
>
>
>
> You will have to remove the “ in the wrapped json, and process it like a
> standard string, so I would use the text processors instead of a json one.
>
>
>
> We did something different from our side: we use encodeJSON do encapsulate
> the json payload, encoded (“ => \” mainly) in a payload string:
>
> Extract the value with evaluatejsonpath with payload attribute as $
>
> Update attribute with payload = ${payload:escapeJson()}
>
> And a jolt processor with default operation like:
>
> [
>
>   {
>
> "operation": "shift",
>
> "spec": {
>
>   "PLANT": "DE_PLANT"
>
> }
>
>   },
>
>   {
>
> "operation": "default",
>
> "spec": {
>
>   "PAYLOAD": "${payload}"
>
> }
>
>   }
>
> ]
>
>
>
>
>
>
>
>
> *Aurélien DEHAY *Big Data Architect
> +33 616 815 441
>
> aurelien.de...@faurecia.com
>
> 2 rue Hennape - 92735 Nanterre Cedex – France
>
> [image: Faurecia_inspiring_mobility_logo-RVB_150]
>
>
>
> *From:* Steve Champagne [mailto:champa...@gmail.com]
> *Sent:* mardi 28 août 2018 00:53
> *To:* users@nifi.apache.org
> *Subject:* Wrapping a JSON string
>
>
>
> Hello,
>
>
>
> I'm ingesting some JSON data that I'd like to wrap in a json_string field
> as a string type. I tried using a JsonPathReader with a dynamic property
> 'json_string' and a value of $, but I seem to be getting back a string
> version of the JSON:
>
>
>
> {"partition_date":"2018-01-01T00:00:00.000Z","json_string":"{@timestamp=2018-01-01T00:00:00.000Z,
> id=1, name=John}"}
>
>
>
> I was wondering if there was a way that I could do this and preserve the
> raw JSON format?
>
>
>
> Thanks,
>
> Steve
>
> This electronic transmission (and any attachments thereto) is intended
> solely for the use of the addressee(s). It may contain confidential or
> legally privileged information. If you are not the intended recipient of
> this message, you must delete it immediately and notify the sender. Any
> unauthorized use or disclosure of this message is strictly prohibited.
> Faurecia does not guarantee the integrity of this transmission and shall
> therefore never be liable if the message is altered or falsified nor for
> any virus, interception or damage to your system.
>


Re: how to copy root fields in Jolt transform?

2018-09-04 Thread Matt Burgess
Add the following to the end of your shift spec:

"*": "&"

This (at the root) matches any key not already matched and puts it in
the output at the same location.

Regards,
Matt

On Tue, Sep 4, 2018 at 11:39 AM l vic  wrote:
>
> Hi,
> I want to "flatten" the following object:
> {
> "id": 0,
> "name": "Root",
>  "mylist": [{
>   "id": 10,
>   "info": "2am-3am"
> },
> {
> "id": 11,
> "info": "3AM-4AM"
> },
> {
> "id": 12,
> "info": "4am-5am"
> }]
> }
> I figured how to "flatten" array, but "root" values are gone in transformed 
> output:
> {
>"mylist-0-id":10,
>" mylist-0-info":"2am-3am",
> 
> }
> How can I retain "root" fields so that transformed output would like like one 
> below?
> {
>  "id": 0,
> "name": "Root",
>  "mylist-0-id":10,
>" mylist-0-info":"2am-3am",
> 
> }
>
>


Re: Managing Secrets in NiFi Registry

2018-09-04 Thread Andy LoPresto
Jonathan,

Bryan gave a really good response to this. As he mentioned, there has 
definitely been discussion about providing this feature and further securing 
the variable registry and the integration between NiFi and NiFi Registry.

I’ve listed some open Jiras at the bottom which further document some of the 
thought process and capture the current state. I have wanted to resolve these, 
but other tasks have been prioritized. Now that NiFi Registry adoption is 
growing as you mention, these are coming to the forefront.

https://issues.apache.org/jira/browse/NIFI-2653 

https://issues.apache.org/jira/browse/NIFI-3110 

https://issues.apache.org/jira/browse/NIFI-5364 




Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 4, 2018, at 9:25 AM, Bryan Bende  wrote:
> 
> Hello,
> 
> You are correct that currently variables cannot be marked as
> sensitive, and for that reason they currently shouldn't be used to
> store sensitive values since they will be stored in plain-text in the
> flow.xml.gz and also in the versioned flows saved to registry.
> 
> I do think the concept of sensitive variables makes sense, and it was
> discussed in detail on the dev list a while ago [1], but no one has
> taken on the effort to implement it yet.
> 
> Currently the values of sensitive properties are stripped out when
> saving a flow to registry. When you import a flow the values then need
> to be set, but only one time and it is excluded from being considered
> as a local change, so in the future when a new version of the flow is
> saved to registry and then the prod flow is upgraded, it will retain
> that value.
> 
> If you are automating deployments then you should be able to do almost
> the same thing you are doing with config files per environment, but
> just shift to sourcing the flow from registry, rather than from a
> template (I assume that is what is you are doing now).
> 
> The NiFi CLI [2] has commands to deploy the flow, but layering on the
> sensitive values is something that would have to be done in a
> tool/scripts that wrap the CLI. It could be a nice addition to the CLI
> to take a config file and a process group id and populate all of the
> necessary values.
> 
> -Bryan
> 
> [1] https://markmail.org/thread/53fzpsjbkp3uyxhb
> [2] https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli
> 
> 
> On Tue, Sep 4, 2018 at 11:12 AM, Jonathan Meran
>  wrote:
>> Hello,
>> 
>> I am looking for some guidance on managing sensitive property values for
>> things such as credentials in a DBCPConnectionPool within the NiFi Registry
>> Development Life Cycle.
>> 
>> 
>> 
>> Currently we have rolled our own deployment tool in which we manage
>> configuration files per environment (Dev, QA, Prod, etc) and use the NiFi
>> API to deploy our Process Group and all the environment-specific properties.
>> We are looking to make the switch to using NiFi Registry instead of our own
>> tool but I don’t see a way to properly manage secrets.
>> 
>> 
>> 
>> I believe we could use the Variable Registry but I have a few concerns with
>> that approach:
>> 
>> Not all Processors and Controller Services support Expression Language so we
>> may have limitations with referencing properties and secrets inside the
>> Variable Registry.
>> There is no way (that I can tell) to mark a Variable as “sensitive” so that
>> it is write-only and not readable by other NiFi users after being set.
>> Are “sensitive” properties encrypted at rest inside flow.xml..gz? If so,
>> then we also lose encryption-at-rest if we use Variable Registry.
>> 
>> 
>> 
>> I’m certain that every other NiFi Registry user will run into this same
>> issues so I am curious what others have done and what security trade-offs
>> they have made to continue on with the efficiency of using NiFi Registry.
>> 
>> 
>> 
>> Thanks,
>> 
>> Jon
>> 
>> 
>> 
>> 
>> 
>> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Managing Secrets in NiFi Registry

2018-09-04 Thread Bryan Bende
Hello,

You are correct that currently variables cannot be marked as
sensitive, and for that reason they currently shouldn't be used to
store sensitive values since they will be stored in plain-text in the
flow.xml.gz and also in the versioned flows saved to registry.

I do think the concept of sensitive variables makes sense, and it was
discussed in detail on the dev list a while ago [1], but no one has
taken on the effort to implement it yet.

Currently the values of sensitive properties are stripped out when
saving a flow to registry. When you import a flow the values then need
to be set, but only one time and it is excluded from being considered
as a local change, so in the future when a new version of the flow is
saved to registry and then the prod flow is upgraded, it will retain
that value.

If you are automating deployments then you should be able to do almost
the same thing you are doing with config files per environment, but
just shift to sourcing the flow from registry, rather than from a
template (I assume that is what is you are doing now).

The NiFi CLI [2] has commands to deploy the flow, but layering on the
sensitive values is something that would have to be done in a
tool/scripts that wrap the CLI. It could be a nice addition to the CLI
to take a config file and a process group id and populate all of the
necessary values.

-Bryan

[1] https://markmail.org/thread/53fzpsjbkp3uyxhb
[2] https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli


On Tue, Sep 4, 2018 at 11:12 AM, Jonathan Meran
 wrote:
> Hello,
>
> I am looking for some guidance on managing sensitive property values for
> things such as credentials in a DBCPConnectionPool within the NiFi Registry
> Development Life Cycle.
>
>
>
> Currently we have rolled our own deployment tool in which we manage
> configuration files per environment (Dev, QA, Prod, etc) and use the NiFi
> API to deploy our Process Group and all the environment-specific properties.
> We are looking to make the switch to using NiFi Registry instead of our own
> tool but I don’t see a way to properly manage secrets.
>
>
>
> I believe we could use the Variable Registry but I have a few concerns with
> that approach:
>
> Not all Processors and Controller Services support Expression Language so we
> may have limitations with referencing properties and secrets inside the
> Variable Registry.
> There is no way (that I can tell) to mark a Variable as “sensitive” so that
> it is write-only and not readable by other NiFi users after being set.
> Are “sensitive” properties encrypted at rest inside flow.xml..gz? If so,
> then we also lose encryption-at-rest if we use Variable Registry.
>
>
>
> I’m certain that every other NiFi Registry user will run into this same
> issues so I am curious what others have done and what security trade-offs
> they have made to continue on with the efficiency of using NiFi Registry.
>
>
>
> Thanks,
>
> Jon
>
>
>
>
>
>


how to copy root fields in Jolt transform?

2018-09-04 Thread l vic
Hi,
I want to "flatten" the following object:
{
"id": 0,
"name": "Root",
 "mylist": [{
  "id": 10,
  "info": "2am-3am"
},
{
"id": 11,
"info": "3AM-4AM"
},
{
"id": 12,
"info": "4am-5am"
}]
}
I figured how to "flatten" array, but "root" values are gone in transformed
output:
{
   "mylist-0-id":10,
   " mylist-0-info":"2am-3am",

}
How can I retain "root" fields so that transformed output would like like
one below?
{
 "id": 0,
"name": "Root",
 "mylist-0-id":10,
   " mylist-0-info":"2am-3am",

}


Managing Secrets in NiFi Registry

2018-09-04 Thread Jonathan Meran
Hello,
I am looking for some guidance on managing sensitive property values for things 
such as credentials in a DBCPConnectionPool within the NiFi Registry 
Development Life Cycle.

Currently we have rolled our own deployment tool in which we manage 
configuration files per environment (Dev, QA, Prod, etc) and use the NiFi API 
to deploy our Process Group and all the environment-specific properties. We are 
looking to make the switch to using NiFi Registry instead of our own tool but I 
don’t see a way to properly manage secrets.

I believe we could use the Variable Registry but I have a few concerns with 
that approach:

  1.  Not all Processors and Controller Services support Expression Language so 
we may have limitations with referencing properties and secrets inside the 
Variable Registry.
  2.  There is no way (that I can tell) to mark a Variable as “sensitive” so 
that it is write-only and not readable by other NiFi users after being set.
  3.  Are “sensitive” properties encrypted at rest inside flow.xml..gz? If so, 
then we also lose encryption-at-rest if we use Variable Registry.

I’m certain that every other NiFi Registry user will run into this same issues 
so I am curious what others have done and what security trade-offs they have 
made to continue on with the efficiency of using NiFi Registry.

Thanks,
Jon








Re: Best practices for running Apache NiFi in production in a Docker container

2018-09-04 Thread Joe Percivall
Hi Peter,

Thanks for the follow-up. Yup, I agree with the relationship between
Xmx/Xms and UseCGroupMemoryLimitForHeap/MaxRAMFraction.

For the MaxRAMFraction=1 though, it seems that it does leave at least some
room for off-heap memory that said, that may not be enough for NiFi and
it's normal use-cases. Has anyone run the Native Memory Tracker[1] on a
"real-world" system? It gives a nice dump of where/how the JVM is using
memory[2].

[1]
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html
[2] http://trustmeiamadeveloper.com/2016/03/18/where-is-my-memory-java/

Joe

On Fri, Aug 31, 2018 at 5:10 AM Peter Wilcsinszky <
peterwilcsins...@gmail.com> wrote:

> Hi,
>
> I haven't done extensive research in this area but ran through the
> articles and also found another one [1]. From what I understand
> UseCGroupMemoryLimitForHeap is just the dynamic version of setting memory
> limits manually using Xmx and Xms which is currently done by the NiFi start
> script explicitly. In an environment where it should be done in a more
> dynamic fashion the UseCGroupMemoryLimitForHeap with proper MaxRAMFraction
> should be used but for caveats check the comments here: [1] and here: [2]
> (My understanding: MaxRAMFraction=1 considered to be unsafe,
> MaxRAMFraction=2 leaves half the memory unused)
>
> [1] https://banzaicloud.com/blog/java-resource-limits/
> [2]
> https://stackoverflow.com/questions/49854237/is-xxmaxramfraction-1-safe-for-production-in-a-containered-environment
>
>
> On Thu, Aug 30, 2018 at 7:54 PM Joe Percivall 
> wrote:
>
>> Hey everyone,
>>
>> I was recently searching for a best practice guide for running a
>> production instance of Apache NiFi within a Docker container and couldn't
>> find anything specific other than the normal guidance for best practices of
>> a high-performance instance[1]. I did expand my search for best practices
>> on running the JVM within a container and found a couple good
>> articles[2][3]. The first of which explains why the JVM will take up more
>> than is set via "Xmx" and the second is about 2 JVM options which were
>> backported from Java 9 to JDK 8u131 specifically for configuring the JVM
>> heap for running in a "VM".
>>
>> So with that, a couple questions:
>> 1: Does anyone have any best practices or lessons learned specifically
>> for running NiFi in a container?
>> 2:  "UseCGroupMemoryLimitForHeap" and "MaxRAMFraction" are technically
>> "Experimental VM Options", has anyone used them in practice?
>>
>> [1]
>> https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
>>
>> [2]
>> https://developers.redhat.com/blog/2017/04/04/openjdk-and-containers/#more-433899
>> [3]
>> https://blog.csanchez.org/2017/05/31/running-a-jvm-in-a-container-without-getting-killed/
>>
>> Thanks,
>> Joe
>> --
>> *Joe Percivall*
>> linkedin.com/in/Percivall
>> e: jperciv...@apache.com
>>
>

-- 
*Joe Percivall*
linkedin.com/in/Percivall
e: jperciv...@apache.com