Re: Reading flowfile in a stream callback

2017-11-02 Thread Andy LoPresto
James,

The Python API should be the same as the Java FlowFile.java interface [1]. Matt 
Burgess’ blog has a good post about using Jython to do flowfile content 
manipulation. Something like:

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  session.transfer(flowFile, REL_SUCCESS)

With PyStreamCallback declared as a class above that block in the script:

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
pass
  def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
reversedText = text[::-1]

outputStream.write(bytearray(reversedText.encode('utf-8')))

In Groovy, you can declare the StreamCallback as an inline closure to make this 
more compact, but I believe in Jython it needs to be a separate declaration. 
Hope this helps.

[1] 
https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFile.java
 

[2] 
https://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited_14.html
 



Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 2, 2017, at 12:53 PM, James McMahon  wrote:
> 
> In python, I can use the requests library to post content something like htis:
> 
> import requests
> url="https://abc.test.org "
> files={'file':open('/somedir/myfile.txt','rb')}
> r = requests.post(url,files=files)
> 
> If I am in a python stream callback, how can I read the flowfile payload in 
> the same way that the open() reads its file from disk?



signature.asc
Description: Message signed with OpenPGP using GPGMail


Reading flowfile in a stream callback

2017-11-02 Thread James McMahon
In python, I can use the requests library to post content something like
htis:

import requests
url="https://abc.test.org;
files={'file':open('/somedir/myfile.txt','rb')}
r = requests.post(url,files=files)

If I am in a python stream callback, how can I read the flowfile payload in
the same way that the open() reads its file from disk?


Issue with Executescript

2017-11-02 Thread N, Vyshali
Hi,

I'm using the executescript process to generate some fake data using "Faker"
package and replacing it in the original data.I have attached the script for
your reference.

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import unicodecsv as csv
from faker import Factory
from collections import defaultdict

class TransformCallback(StreamCallback):
def _init_(self):
pass

def process(self,inputStream,outputStream):
text = IOUtils.toString(inputStream,StandardCharsets.ISO_8859_1)
faker  = Factory.create()//generating fake data
names  = defaultdict(faker.name)
emails = defaultdict(faker.email)
ssns = defaultdict(faker.ssn)
phone_numbers = defaultdict(faker.phone_number)

for row in text.splitlines():
row["name"]  = names[row["name"]] //Assigning the fake data
row["email"] = emails[row["email"]]
row["ssn"] = ssns[row["ssn"]]
row["phone_number"] = phone_numbers[row["phone_number"]]
flowFile = session.putAttribute(flowFile,"name",row["name"])

outputStream.write(text.encode('UTF8'))


flowFile = session.get()
if flowFile != None:
flowFile = session.write(flowFile,TransformCallback())
session.transfer(flowFile, REL_SUCCESS)
session.commit()

But I'm unable to execute it successfully.I'm getting the following error
"ProcessException:TypeError:None required"

I'm not much familiar to python.Please give me suggestions on how can I
solve this.Correct me in case my coding is also not appropriate.

Regards,
Vyshali




SplitAvro fails to split data

2017-11-02 Thread Pradip
I have a NiFi flow that queries a relational Table that has a latestUpdate date 
timestamp column  and forwards data in that table that was  inserted/modified 
in the last 10 min sends it to a JMS queue. The flow is something like 
thisExecuteSQL --> splitAvro --> convertAvroToJson --> some more processing --> 
publishJMS
The problem I am running into is if the ExecuteSQL returns a single row of 
data, the splitAvro doesn't forward the single row of data on the split 
downstream. 
The splitAvro has the following properties set:Split Stratergy:   RecordOutput 
Size:  1Output Strategy: DatafileTransfer Metadata: true

Is this a bug in the splitAvro code, because I would expect the data to get 
split if there is 1 or more records in the flowfile?
I am running on NiFi 1.0

Thanks,Pradip


What would be a good way to build NoSQL entity from SQL data?

2017-11-02 Thread Eric Chaves
Hi fellows,

I'm planning a flow to feed data from a relational database into
elasticsearch and would like to ask for some advice.

The database has an entity Person with 1:N relation to Email and Phone
(email and phone are objects, not just scalar values).

The flow needs to feed ES with JSON object representing the Person object
and his Emails and Phones (like below) and when any table record gets
updated the flow needs to "refresh" the JSON entity to update corresponding
ES document.

-- Person.json
{
  name: 'Eric',
  age: 43,
  phones:[{number: '11--', qualified: false, updatedAt: '2017-11-29
10:01:13'}],
  emails: [{address: 'e...@domain.org', qualified: true, updatedAt:
'2017-11-10 09:12:35'}],
  updatedAt: '2017-11-30 17:29:53'
}

My first thought was to write a flow that monitors changes on the Person,
Email and Phone tables using QueryDatabaseTable processor and once a change
is detected to route the record (which always contains a PersonId) to a
processor that performs multiple queries to mount the Person JSON. So far
the only way I could find to do it is trough script processors.

Given that there is a lot of new Record based processors with schema
supports I was wondering if there a better approach or an idiomatic way of
performing this kind of "sql lookups" in NiFi.

What do you think?

Cheers,

Eric


Re: NiFi Processor Lifecycle Diagram

2017-11-02 Thread Pierre Villard
Hi,

There is actually a JIRA for that [1]. Feel free to add attachments to this
JIRA or submit a PR with your work.
I'm sure someone will review and get it merged into our documentation!

[1] https://issues.apache.org/jira/browse/NIFI-1345

Thanks

2017-11-02 15:51 GMT+01:00 Ryan H :

> Thanks for links.
>
>I've whiteboarded some stuff, but nothing serious.  I'll be sure to
> share if I get something going.
>
> Ryan
>
> On Mon, Oct 30, 2017 at 7:18 PM, Andy LoPresto 
> wrote:
>
>> Hi Ryan,
>>
>> I am not aware of a diagram, but the best resource for the component
>> lifecycle is the Developer Guide section on it [1]. The “Apache NiFi In
>> Depth” document may also provide some good information in the context of
>> flowfile lifecycle [2]. If you do make a diagram of this, I’m sure it would
>> benefit the community. Thanks.
>>
>> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-gu
>> ide.html#component-lifecycle
>> [2] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-dept
>> h.html#life-of-a-flowfile
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com *
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Oct 30, 2017, at 11:29 AM, Ryan H  wrote:
>>
>> Hi,
>>I'm looking for a NiFi Processor Lifecycle Diagram to show when the
>> lifecycle events get kicked off, such as @OnPropertyModified, or
>> @Started/Stopped, etc.
>>
>>I've read through the docs a few times, although it's been a while
>> since I've really scoured all the links, I can't seem to find one.
>>
>>   Anyone have a link to one?  (I would totally print this and hang it up
>> in the office.)
>>
>> Thanks,
>> Ryan
>>
>>
>>
>


Re: NiFi Processor Lifecycle Diagram

2017-11-02 Thread Ryan H
Thanks for links.

   I've whiteboarded some stuff, but nothing serious.  I'll be sure to
share if I get something going.

Ryan

On Mon, Oct 30, 2017 at 7:18 PM, Andy LoPresto  wrote:

> Hi Ryan,
>
> I am not aware of a diagram, but the best resource for the component
> lifecycle is the Developer Guide section on it [1]. The “Apache NiFi In
> Depth” document may also provide some good information in the context of
> flowfile lifecycle [2]. If you do make a diagram of this, I’m sure it would
> benefit the community. Thanks.
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/developer-
> guide.html#component-lifecycle
> [2] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-
> depth.html#life-of-a-flowfile
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 30, 2017, at 11:29 AM, Ryan H  wrote:
>
> Hi,
>I'm looking for a NiFi Processor Lifecycle Diagram to show when the
> lifecycle events get kicked off, such as @OnPropertyModified, or
> @Started/Stopped, etc.
>
>I've read through the docs a few times, although it's been a while
> since I've really scoured all the links, I can't seem to find one.
>
>   Anyone have a link to one?  (I would totally print this and hang it up
> in the office.)
>
> Thanks,
> Ryan
>
>
>


HandleHttpRequest

2017-11-02 Thread Juan Sequeiros
Hello all,

Is it considered a best practice to leave the thread count on
HanddleHttpRequest to 1?  Assuming the JETTY server is handling requests?
are we making things worst by increasing that?

We are moving towards using HanddleHttpRequest versus ListenHttp but we are
struggling with performance versus ListenHttp.

We are receiving lots of data ( in the thousands of flowfiles ) and we are
seeing many messages about "unable to create content claim due to
org.eclipse.jetty.EoFexception"

Our flow pretty much sends the success to the HanddleHttpResponse and then
we do the rest of our flow processing on the flowfile.

I've noticed some examples have the flow go through its life cycle and then
send the http response, or we sending response to soon? ( I dont think )