Re: Maintain character encoding in workflow

Mark Payne Thu, 30 Apr 2015 11:54:12 -0700

Adam,

Joe got his answer out before I this, I realize :) I'll try to go into abit more detail on some of things here, in case it's helpful.

The easiest thing to do would be to make the following changes innifi.properties:


nifi.provenance.repository.rollover.time=30 secs

nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID,Filename, ProcessorID

nifi.content.viewer.url=/nifi-content-viewer/

nifi.content.repository.archive.max.retention.period=24 hours
nifi.content.repository.archive.max.usage.percentage=80%

The first property says to index all provenance events after 30 secondsinstead of waiting 5 mins (the default).Second property says to index those specific fields for all provenanceevents.

Third property enables the Provenance Data content viewer.

The other 2 properties indicate that the content should be kept on thebox for up to 24 hours, but to delete content if the disk is 80% full.


After changing those, you'd need to restart your system.

So I'm suggesting that you do that so that you can make use of NiFi'sdata provenance to debug workflows. It's a super powerful feature.

Then, you can click on the Data Provenance icon in the UI (4th icon inthe toolbar in the very right-hand side). Then click "Search". You cansearch by filename or whatever. If you just want to find data comingfrom the twitter processor, you can enter that for the "Component ID"(to get the id of that processor, right-click on it and chooseconfigure. it's in the Settings tab.)

Then when you search you can see up to 1000 results. Click the littleicon on the right-hand side that looks a bit like a propeller (it'sactually intended to show a graph/tree). From there you can see whathappened to the data as it went through your flow. For any of thoseevents, you can right-click and "View Details". This will show you allsorts of info about the event. In the Content tab, you can click "View"to see what the content looked like at that point in time. You can thengo back to the lineage view and look at the next or previous event anddo the same thing until you know exactly where it changed.


Hope this helps!

Let us know if you have any further questions.

Thanks
-Mark

------ Original Message ------
From: "Adam Estrada" <[email protected]>
To: [email protected]
Sent: 4/30/2015 2:20:56 PM
Subject: Maintain character encoding in workflow

All,
I am coming across an issue where my unicode characters are beingconvertedto their unicode point representations (as javascript escapes) likethis"\u0432\u0430\u0436\u043d\u0435\u0435". This is happening with Twitterdatathat is collected using the Twitter processor. How can I debug myworkflow
to figure out where the characters are being converted?

Thanks,
Adam

Re: Maintain character encoding in workflow

Reply via email to