how to loop workflows

2016-10-20 Thread Alessio Palma

Hello all, is there any way to loop workflow by list or counters?
Oozie allows to break the rules of acyclic graphs with some tricks, is the same 
in NIFI ?


Re: How to increase the processing speed of the ExtractText and ReplaceText Processor?

2016-10-20 Thread prabhu Mahendran
Lee,

I have tried your suggested flow which can able to insert the data into sql
server in 50 minutes And it also take long time.

*==>*your Query:
*You might be processing the entire dat file (instead of a single row) for
each record.*
  How can i process entire dat file into SQL Server?


*==>Query:Without any new optimizations you'll need ~25 threads and
sufficient memory to feed the threads.*
  My processors runs in 10 threads only by setting concurrent threads,How
to increase it to be 25 threads.

If you try quick test then please share "what is regex which you have used?"

Is there any other processor having functionality like extract text?

Thanks



On Wed, Oct 19, 2016 at 11:29 PM, Lee Laim  wrote:

> Prabu,
>
> In order to move 3M rows in 10 minutes, you'll need to process 5000
> rows/second.
> During your 4 hour run, you were processing ~200 rows/second.
>
> Without any new optimizations you'll need ~25 threads and sufficient
> memory to feed the threads.  I agree with Mark and you should be able to
> get far more than 200 rows/second.
>
> I ran a quick test using your ExtractText regex on similar data I was able
> to process over 100,000 rows/minute through the extract text processor.
> The input data was a single row of 4 fields delimited by the "|" symbol.
>
> *You might be processing the entire dat file (instead of a single row) for
> each record.*
> *Can you check the flow file attributes and content going into
> ExtractText?  *
>
>
> Here is the flow with some notes:
>
> 1.GetFile (a 30 MB .dat file consisting of 3M rows; each row is about 10
> bytes)
>
> 2 SplitText -> SplitText  (to break the 3M rows down to manageable chunks
> of 10,000 lines per flow file, then split again to 1 line per flow file)
>
> 3. ExtractText to extract the 4 fields
>
> 4. ReplaceText to generate json (You can alternatively use
> AttributesToJson here)
>
> 5. ConvertJSONtoSQL
>
> 6. PutSQL - (This should be true bottleneck; Index the DB well and use
> many threads)
>
>
> If my assumptions are incorrect, please let me know.
>
> Thanks,
> Lee
>
> On Thu, Oct 20, 2016 at 1:43 AM, Kevin Verhoeven <
> kevin.verhoe...@ds-iq.com> wrote:
>
>> I’m not clear on how much data you are processing, does the data(.dat)
>> file have 3,00,000 rows?
>>
>>
>>
>> Kevin
>>
>>
>>
>> *From:* prabhu Mahendran [mailto:prabhuu161...@gmail.com]
>> *Sent:* Wednesday, October 19, 2016 2:05 AM
>> *To:* users@nifi.apache.org
>> *Subject:* Re: How to increase the processing speed of the ExtractText
>> and ReplaceText Processor?
>>
>>
>>
>> Mark,
>>
>> Thanks for the response.
>>
>> My Sample input data(.dat) like below..,
>>
>> 1|2|3|4
>> 6|7|8|9
>> 11|12|13|14
>>
>> In Extract Text,i have add input row only with addition of default
>> properties like below screenshot.
>>
>> [image: Inline image 1]
>> In Replace text ,
>>
>> just replace value like {"data1":"${inputrow.1}","data
>> 2":"${inputrow.2}","data3":"${inputrow.3}","data4":"${inputrow.4}"}
>> [image: Inline image 2]
>>
>>
>> Here there is no bulletins indicates back pressure on processors.
>>
>> Can i know prerequisites needed for move the 3,00,000 data into sql
>> server in duration 10-20 minutes?
>> What are the number of CPU' s needed?
>> How much heap size and perm gen size we need to set for move that data
>> into sql server?
>>
>> Thanks
>>
>>
>>
>> On Tue, Oct 18, 2016 at 7:05 PM, Mark Payne  wrote:
>>
>> Prabhu,
>>
>>
>>
>> Thanks for the details. All of this seems fairly normal. Given that you
>> have only a single core,
>>
>> I don't think multiple concurrent tasks will help you. Can you share your
>> configuration for ExtractText
>>
>> and ReplaceText? Depending on the regex'es being used, they can be
>> extremely expensive to evaluate.
>>
>> The regex that you mentioned in the other email -
>> "(.+)[|](.+)[|](.+)[|](.+)" is in fact extremely expensive.
>>
>> Any time that you have ".*" or ".+" in your regex, it is going to be
>> extremely expensive, especially with
>>
>> longer FlowFile content.
>>
>>
>>
>> Also, do you see any bulletins indicating that the provenance repository
>> is applying backpressure? Given
>>
>> that you are splitting your FlowFiles into individual lines, the
>> provenance repository may be under a lot
>>
>> of pressure.
>>
>>
>>
>> Another thing to check, is how much garbage collection is occurring. This
>> can certainly destroy your performance
>>
>> quickly. You can get this information by going to the "Summary Table" in
>> the top-right of the UI and then clicking the
>>
>> "System Diagnostics" link in the bottom-right corner of that Summary
>> Table.
>>
>>
>>
>> Thanks
>>
>> -Mark
>>
>>
>>
>>
>>
>> On Oct 18, 2016, at 1:31 AM, prabhu Mahendran 
>> wrote:
>>
>>
>>
>> Mark,
>>
>> Thanks for your response.
>>
>> Please find the response for your questions.
>>
>> ==>The first processor that you see that exhibits poor performance is
>> ExtractText, correct?
>>  Yes,Extract Text exhibits poor performance.
>>
>> ==>How big

Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-20 Thread Conrad Crampton
Ok,
So I tried everything suggested so far to no avail unfortunately.

So what I have done is to create all new certs etc. using the tookit. Updated 
my existing authoriszed-users.xml to have to match the full cert distinguished 
names CN=server.name, OU=NIFI etc.

Recreated all my remote process groups to not reference the original NCM as 
that still wouldn’t work – after a complete new install (upgrade).

So now what I have is a six node cluster using original data/worker nodes and 
they are part of the cluster – all appears to be working ie. I can log into the 
UI (nice by the way ;-) on each server. There are no SSL handshake errors etc. 
BUT the RPG (newly created) still don’t appear to be working. I am getting

11:34:24 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@782af623 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi1-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3a547274 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@54c2df1 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi5-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@50d59f3c Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:26 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@97c92ef Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:26 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@70663037 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster
11:34:27 GMTWARNINGe1990203-0157-1000--9ff40dc0
nifi4-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3c040426 Unable 
to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
cluster in order to determine which nodes exist in the remote cluster

I can telnet from server to server on both https (UI) port and S2S port.
I am really at a loss as to what to do now.

Data is queuing up in my input processors with nowhere to go.
Do I have to do something radical here to get this working like stopping 
everything, clearing out all the queues then starting up again??? I really 
don’t want to do this obviously but I am getting nowhere on this – two days of 
frustration with nothing to show for it.

Any more suggestions please??
Thanks for your patience.
Conrad


From: Andy LoPresto 
Reply-To: "users@nifi.apache.org" 
Date: Wednesday, 19 October 2016 at 18:24
To: "users@nifi.apache.org" 
Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

* PGP Signed by an unknown key
Hi Conrad,

Bryan is correct that changing the certificates (and the encapsulating 
keystores and truststores) will not affect any data held in the nodes.

Regenerating everything using the TLS toolkit should hopefully not be too 
challenging, but I am also curious as to why you are getting these handshake 
exceptions now. As Bryan pointed out, adding the following line to 
bootstrap.conf will provide substantial additional log output which should help 
trace the issue.

java.arg.15=-Djavax.net.debug=ssl,handshake

You can also imitate the node connecting to the (previous) NCM via this command:

$ openssl s_client -connect  -debug -state -cert 
 -key  -CAfile 


Where:


  *= the hostname and port of the “NCM”
  *= the public key used to identify the “node” (can 
be exported from the node keystore [1])
  *= the private key used to identify the “node” (can 
be exported from the node keystore via 2 step process)
  *= the public key used to sign the “NCM” 
certificate (could be a 3rd party like Verisign or DigiCert, or an internal 
organization CA if you have one)

If you’ve already regenerated everything and it works, that’s fine. But if you 
have the time to try and investigate the old certs, we are interested and 
prepared to help. Thanks.

[1] https://security.stackexchange.com/a/66865/16485

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On O

Re: how to loop workflows

2016-10-20 Thread Koji Kawamura
Hello Alessio,

I have an example NiFi template to loop workflow using counter
attribute and NiFi expression:
https://gist.github.com/ijokarumawak/01c4fd2d9291d3e74ec424a581659ca8

NiFi data-flow can be cyclic to loop the same flow-file until certain
condition meets.

Koji

On Thu, Oct 20, 2016 at 5:27 PM, Alessio Palma
 wrote:
>
> Hello all, is there any way to loop workflow by list or counters?
> Oozie allows to break the rules of acyclic graphs with some tricks, is the 
> same in NIFI ?


Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups

2016-10-20 Thread Andy LoPresto
Conrad,

For the site-to-site did you follow the instructions here [1]? Each node needs 
to be added as a user in order to make the connections.

[1] 
http://bryanbende.com/development/2016/08/30/apache-nifi-1.0.0-secure-site-to-site
 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 20, 2016, at 7:36 AM, Conrad Crampton  
> wrote:
> 
> Ok,
> So I tried everything suggested so far to no avail unfortunately.
> 
> So what I have done is to create all new certs etc. using the tookit. Updated 
> my existing authoriszed-users.xml to have to match the full cert 
> distinguished names CN=server.name, OU=NIFI etc.
> 
> Recreated all my remote process groups to not reference the original NCM as 
> that still wouldn’t work – after a complete new install (upgrade).
> 
> So now what I have is a six node cluster using original data/worker nodes and 
> they are part of the cluster – all appears to be working ie. I can log into 
> the UI (nice by the way ;-) on each server. There are no SSL handshake errors 
> etc. BUT the RPG (newly created) still don’t appear to be working. I am 
> getting
> 
> 11:34:24 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
> nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@782af623 
> Unable to refresh Remote Group's peers due to Unable to communicate with 
> remote NiFi cluster in order to determine which nodes exist in the remote 
> cluster
> 11:34:25 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
> nifi1-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3a547274 
> Unable to refresh Remote Group's peers due to Unable to communicate with 
> remote NiFi cluster in order to determine which nodes exist in the remote 
> cluster
> 11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
> nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@54c2df1 Unable 
> to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
> cluster in order to determine which nodes exist in the remote cluster
> 11:34:25 GMTWARNINGe1990203-0157-1000--9ff40dc0
> nifi5-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@50d59f3c 
> Unable to refresh Remote Group's peers due to Unable to communicate with 
> remote NiFi cluster in order to determine which nodes exist in the remote 
> cluster
> 11:34:26 GMTWARNINGe19ccf8e-0157-1000--63bfd9c0
> nifi2-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@97c92ef Unable 
> to refresh Remote Group's peers due to Unable to communicate with remote NiFi 
> cluster in order to determine which nodes exist in the remote cluster
> 11:34:26 GMTWARNINGe1990203-0157-1000--9ff40dc0
> nifi6-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@70663037 
> Unable to refresh Remote Group's peers due to Unable to communicate with 
> remote NiFi cluster in order to determine which nodes exist in the remote 
> cluster
> 11:34:27 GMTWARNINGe1990203-0157-1000--9ff40dc0
> nifi4-cm1.local:9443org.apache.nifi.remote.client.PeerSelector@3c040426 
> Unable to refresh Remote Group's peers due to Unable to communicate with 
> remote NiFi cluster in order to determine which nodes exist in the remote 
> cluster
> 
> I can telnet from server to server on both https (UI) port and S2S port.
> I am really at a loss as to what to do now.
> 
> Data is queuing up in my input processors with nowhere to go.
> Do I have to do something radical here to get this working like stopping 
> everything, clearing out all the queues then starting up again??? I really 
> don’t want to do this obviously but I am getting nowhere on this – two days 
> of frustration with nothing to show for it.
> 
> Any more suggestions please??
> Thanks for your patience.
> Conrad
> 
> 
> From: Andy LoPresto mailto:alopre...@apache.org>>
> Reply-To: "users@nifi.apache.org " 
> mailto:users@nifi.apache.org>>
> Date: Wednesday, 19 October 2016 at 18:24
> To: "users@nifi.apache.org " 
> mailto:users@nifi.apache.org>>
> Subject: Re: Upgrade 0.6.1 to 1.0.0 problems with Remote Process Groups
> 
> * PGP Signed by an unknown key
> Hi Conrad,
> 
> Bryan is correct that changing the certificates (and the encapsulating 
> keystores and truststores) will not affect any data held in the nodes.
> 
> Regenerating everything using the TLS toolkit should hopefully not be too 
> challenging, but I am also curious as to why you are getting these handshake 
> exceptions now. As Bryan pointed out, adding the following line to 
> bootstrap.conf will provide substantial additional log output which should 
> help trace the issue.
> 
> java.arg.15=-Djavax.net.debug=ssl,handshake
> 
> You can also imitate the node connecting to the (previous) NCM via this 
> command:
> 
> $ openssl s_client -connect  -debug -state -cert 
>  -key  -CAfile 
> 
> 

provenance & content repos re infosec

2016-10-20 Thread Jeremy Farbota
Hello,

I'm using NiFi in a compliance setting. One of my use cases is for
deheading (hashing names, ssns, etc) and republishing. It works great for
these tasks but I need to cover my bases to make sure things are not stored
on disk. E.g. when I extract a name to an attribute for hashing, I do not
want to store it unencrypted at rest in the provenance repo.

It seems I can turn off the content repo with this setting:
nifi.content.repository.archive.enabled=false

Is flowfile content stored on disk anywhere once the flowfile is dropped
with the setting above?

Regarding the provenance repo, the settings offer the ability to truncate
the attribute on retrieval e.g.

nifi.provenance.repository.max.attribute.length=8

Does the above setting change only what can be retrieved or does it limit
what is stored?

If it is still storing all the attributes, then I will likely need to
greatly reduce the provenance repo max.storage.time. Would severely
limiting the provenance or content repo negatively affect NiFi's
performance?

Is there a way that I can have these "secure" settings only for certain
templates? Or are these provenance and content repo setting only
configurable server wide?

Has there ever been thought to enable encryption at rest of the provenance
repo to deal with situations like mine?

Thanks in advance.

-- 

[image: Payoff, Inc.] 

Jeremy Farbota
Software Engineer, Data
jfarb...@payoff.com  • (217) 898-8110 <(949)+430-0630>

I'm a Storyteller. Discover your Financial Personality!


[image: Facebook]   [image: Twitter]
 [image: Linkedin]



Re: provenance & content repos re infosec

2016-10-20 Thread Andy LoPresto
Hi Jeremy,

These are great questions and I appreciate your interest in securing data at 
all stages for your application.

Setting nifi.content.repository.archive.enabled=false will turn off content 
repository archiving, but the content will still sit at rest on the file system 
for some period of time (while the data is in use during the flow). To 
completely avoid persisting any content data to the file system, set 
nifi.content.repository.implementation=org.apache.nifi.controller.repository.VolatileContentRepository.
 This will direct NiFi to store the content in-memory during operation (with 
the understanding that power loss could cause data loss).

You can set a similar value to do the same with the provenance repository, with 
the same caveat. 
nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository.

Unfortunately, at this time these settings are global for all NiFi data, rather 
than specific to a processor/process group.

I am working on efforts to provide the following features (and need to get them 
posted in the wiki roadmap to solicit feedback from the community):

* Transparent data encryption for repositories
* Provenance
* Content
* Flowfile (attributes)
* Sensitive attributes
* Cryptographic signatures for provenance event records and lineage chains
* Features to ease data segmentation/isolation (i.e. raw data comes into input 
port/source processor, it is routed by attribute/signature to different 
nodes/clusters with varying security levels or underlying security 
hardening/policies)

I would suggest you stay tuned to the mailing list (off the top of my head, I 
can’t remember if changes to the wiki are posted to users@, so you might want 
to subscribe to dev@ as well) and welcome your input on these feature 
development efforts. There are some other members of our community similarly 
security-minded, and I think we will get some great collaboration on this 
moving forward.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 20, 2016, at 2:03 PM, Jeremy Farbota  wrote:
> 
> Hello,
> 
> I'm using NiFi in a compliance setting. One of my use cases is for deheading 
> (hashing names, ssns, etc) and republishing. It works great for these tasks 
> but I need to cover my bases to make sure things are not stored on disk. E.g. 
> when I extract a name to an attribute for hashing, I do not want to store it 
> unencrypted at rest in the provenance repo.
> 
> It seems I can turn off the content repo with this setting:
> nifi.content.repository.archive.enabled=false
> 
> Is flowfile content stored on disk anywhere once the flowfile is dropped with 
> the setting above?
> 
> Regarding the provenance repo, the settings offer the ability to truncate the 
> attribute on retrieval e.g.
> nifi.provenance.repository.max.attribute.length=8
> 
> Does the above setting change only what can be retrieved or does it limit 
> what is stored?
> 
> If it is still storing all the attributes, then I will likely need to greatly 
> reduce the provenance repo max.storage.time. Would severely limiting the 
> provenance or content repo negatively affect NiFi's performance?
> 
> Is there a way that I can have these "secure" settings only for certain 
> templates? Or are these provenance and content repo setting only configurable 
> server wide?
> 
> Has there ever been thought to enable encryption at rest of the provenance 
> repo to deal with situations like mine?
> 
> Thanks in advance.
> 
> --
> 
>  
> Jeremy Farbota
> Software Engineer, Data
> jfarb...@payoff.com  • (217) 898-8110 
> 
> I'm a Storyteller. Discover your Financial Personality! 
> 
>        
> 


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: provenance & content repos re infosec

2016-10-20 Thread Jeremy Farbota
Andy,

Immense thanks for the thoroughly helpful response.

I'll join the dev list and look forward to hearing about the new features.
This is great news and all of those features are things we would use.

Kindly,


On Thu, Oct 20, 2016 at 11:48 AM, Andy LoPresto 
wrote:

> Hi Jeremy,
>
> These are great questions and I appreciate your interest in securing data
> at all stages for your application.
>
> Setting nifi.content.repository.archive.enabled=false will turn off
> content repository archiving, but the content will still sit at rest on the
> file system for some period of time (while the data is in use during the
> flow). To completely avoid persisting any content data to the file system,
> set nifi.content.repository.implementation=org.apache.
> nifi.controller.repository.VolatileContentRepository. This will direct
> NiFi to store the content in-memory during operation (with the
> understanding that power loss could cause data loss).
>
> You can set a similar value to do the same with the provenance repository,
> with the same caveat. nifi.provenance.repository.implementation=org.
> apache.nifi.provenance.VolatileProvenanceRepository.
>
> Unfortunately, at this time these settings are global for all NiFi data,
> rather than specific to a processor/process group.
>
> I am working on efforts to provide the following features (and need to get
> them posted in the wiki roadmap to solicit feedback from the community):
>
> * Transparent data encryption for repositories
> * Provenance
> * Content
> * Flowfile (attributes)
> * Sensitive attributes
> * Cryptographic signatures for provenance event records and lineage chains
> * Features to ease data segmentation/isolation (i.e. raw data comes into
> input port/source processor, it is routed by attribute/signature to
> different nodes/clusters with varying security levels or underlying
> security hardening/policies)
>
> I would suggest you stay tuned to the mailing list (off the top of my
> head, I can’t remember if changes to the wiki are posted to users@, so
> you might want to subscribe to dev@ as well) and welcome your input on
> these feature development efforts. There are some other members of our
> community similarly security-minded, and I think we will get some great
> collaboration on this moving forward.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 20, 2016, at 2:03 PM, Jeremy Farbota  wrote:
>
> Hello,
>
> I'm using NiFi in a compliance setting. One of my use cases is for
> deheading (hashing names, ssns, etc) and republishing. It works great for
> these tasks but I need to cover my bases to make sure things are not stored
> on disk. E.g. when I extract a name to an attribute for hashing, I do not
> want to store it unencrypted at rest in the provenance repo.
>
> It seems I can turn off the content repo with this setting:
> nifi.content.repository.archive.enabled=false
>
> Is flowfile content stored on disk anywhere once the flowfile is dropped
> with the setting above?
>
> Regarding the provenance repo, the settings offer the ability to truncate
> the attribute on retrieval e.g.
>
> nifi.provenance.repository.max.attribute.length=8
>
> Does the above setting change only what can be retrieved or does it limit
> what is stored?
>
> If it is still storing all the attributes, then I will likely need to
> greatly reduce the provenance repo max.storage.time. Would severely
> limiting the provenance or content repo negatively affect NiFi's
> performance?
>
> Is there a way that I can have these "secure" settings only for certain
> templates? Or are these provenance and content repo setting only
> configurable server wide?
>
> Has there ever been thought to enable encryption at rest of the provenance
> repo to deal with situations like mine?
>
> Thanks in advance.
>
> --
>
> [image: Payoff, Inc.] 
>
> Jeremy Farbota
> Software Engineer, Data
> jfarb...@payoff.com  • (217) 898-8110 <(949)+430-0630>
>
> I'm a Storyteller. Discover your Financial Personality!
> 
>
> [image: Facebook]   [image: Twitter]
>  [image: Linkedin]
> 
>
>
>


-- 

[image: Payoff, Inc.] 

Jeremy Farbota
Software Engineer, Data
jfarb...@payoff.com  • (217) 898-8110 <(949)+430-0630>

I'm a Storyteller. Discover your Financial Personality!


[image: Facebook]   [image: Twitter]
 [image: Linkedin]



attributesToJSON order

2016-10-20 Thread Paul Gibeault (pagibeault)
Hello all,
  I thought I would run this by you before I created a Jira ticket.

The processor attributesToJSON does not create a JSON document with key/values 
in the same order as provided in the processor's configuration.

Example:
  AttributesList: computationName,computationType,strategyName

Output:
{
  " strategyName" : "blue",
  " computationType" : "21DC8X32",
  " computationName" : "453d6c4f-fdd-e611-80c9-0050233e88"
}

This behavior is coming from the datatype used in the attributesToJSON 
processor:

protected Map buildAttributesMapForFlowFile(FlowFile ff, 
String atrList,
boolean 
includeCoreAttributes,
boolean 
nullValForEmptyString) {

Map atsToWrite = new HashMap<>();

. . .
}

Using another datatype that preserved order would correct this behavior.  The 
JSON specification does mention that the object list is order independent.  
This does not necessarily mean we should cause the disorder though.

Should we create a JIRA ticket and solution for this?

Thanks,
Paul Gibeault



Re: attributesToJSON order

2016-10-20 Thread Aldrin Piri
Hey Paul,

Could you highlight the use case you are looking to address or shortcoming
that has emerged because of this?  No strong qualms with providing it, just
not sure I am tracking where this becomes problematic.

Thanks,
Aldrin

On Thu, Oct 20, 2016 at 3:23 PM, Paul Gibeault (pagibeault) <
pagibea...@micron.com> wrote:

> Hello all,
>
>   I thought I would run this by you before I created a Jira ticket.
>
>
>
> The processor attributesToJSON does not create a JSON document with
> key/values in the same order as provided in the processor’s configuration.
>
>
>
> Example:
>
>   AttributesList: computationName,computationType,strategyName
>
>
>
> Output:
>
> {
>
>   " strategyName" : "blue",
>
>   " computationType" : "21DC8X32",
>
>   " computationName" : "453d6c4f-fdd-e611-80c9-0050233e88"
>
> }
>
>
>
> This behavior is coming from the datatype used in the attributesToJSON
> processor:
>
>
>
> protected Map buildAttributesMapForFlowFile(FlowFile
> ff, String atrList,
>
> boolean
> includeCoreAttributes,
>
> boolean
> nullValForEmptyString) {
>
>
>
> Map atsToWrite = new *HashMap*<>();
>
>
>
> . . .
>
> }
>
>
>
> Using another datatype that preserved order would correct this behavior.
> The JSON specification does mention that the object list is order
> independent.  This does not necessarily mean we should cause the disorder
> though.
>
>
>
> Should we create a JIRA ticket and solution for this?
>
>
>
> Thanks,
>
> Paul Gibeault
>
>
>


Re: attributesToJSON order

2016-10-20 Thread Bryan Rosander
I'm not sure of the use case either but if there is one, it should be
pretty easy to let the user select the type of Map they'd prefer to store
it in.

HashMap should probably be default if order doesn't matter, LinkedHashMap
if you want to maintain insertion order, TreeMap if you want alphabetical
order.

Thanks,
Bryan

On Thu, Oct 20, 2016 at 3:27 PM, Aldrin Piri  wrote:

> Hey Paul,
>
> Could you highlight the use case you are looking to address or shortcoming
> that has emerged because of this?  No strong qualms with providing it, just
> not sure I am tracking where this becomes problematic.
>
> Thanks,
> Aldrin
>
> On Thu, Oct 20, 2016 at 3:23 PM, Paul Gibeault (pagibeault) <
> pagibea...@micron.com> wrote:
>
>> Hello all,
>>
>>   I thought I would run this by you before I created a Jira ticket.
>>
>>
>>
>> The processor attributesToJSON does not create a JSON document with
>> key/values in the same order as provided in the processor’s configuration.
>>
>>
>>
>> Example:
>>
>>   AttributesList: computationName,computationType,strategyName
>>
>>
>>
>> Output:
>>
>> {
>>
>>   " strategyName" : "blue",
>>
>>   " computationType" : "21DC8X32",
>>
>>   " computationName" : "453d6c4f-fdd-e611-80c9-0050233e88"
>>
>> }
>>
>>
>>
>> This behavior is coming from the datatype used in the attributesToJSON
>> processor:
>>
>>
>>
>> protected Map buildAttributesMapForFlowFile(FlowFile
>> ff, String atrList,
>>
>> boolean
>> includeCoreAttributes,
>>
>> boolean
>> nullValForEmptyString) {
>>
>>
>>
>> Map atsToWrite = new *HashMap*<>();
>>
>>
>>
>> . . .
>>
>> }
>>
>>
>>
>> Using another datatype that preserved order would correct this behavior.
>> The JSON specification does mention that the object list is order
>> independent.  This does not necessarily mean we should cause the disorder
>> though.
>>
>>
>>
>> Should we create a JIRA ticket and solution for this?
>>
>>
>>
>> Thanks,
>>
>> Paul Gibeault
>>
>>
>>
>
>