Re: How to send the success status to GetFile processor?

2017-02-02 Thread prabhu Mahendran
Oleg,

Thanks for your repsonse.

Is this possible for use directory in FetchFile or any processor ?  i don't
know file name which is stored inside directory .


*${absolute.path}\${filename}*
*Note: *while using GetFile processor doesn't have upstream connections but
i need processor which fetch file inside directory using directory name
without give filename.

Many thanks

On Thu, Feb 2, 2017 at 7:37 PM, Oleg Zhurakousky <
ozhurakou...@hortonworks.com> wrote:

> Prabhu
>
> Not sure I fully understand.
> While indeed GetFile does not allow for an incoming connection, it does
> allow for your use case to happen indirectly by monitoring a predefined
> directory. So, one PG finishes and produces a file that is being put into a
> directory monitored by another PG’s GetFile.
>
> Am I missing something?
>
> Cheers
> Oleg
>
> On Feb 2, 2017, at 5:48 AM, prabhu Mahendran 
> wrote:
>
> Consider the below scenario:
>
> ProcessGroupA->ProcessGroupB
>
>
> Since my ProcessgroupA ends with ExecuteProcess processor that runs
> console application and save result into a directory. In ProcessGroupB, I
> will process each file in the saved directory using GetFile processor.
>
>
> Once, ProcessGroupA is completed I want to run the ProcessgroupB which
> starts with GetFile processor. Since GetFile processor doesnt't have
> upstream connection, I couldn't run the flow here. How to send the success
> status to GetFile processor?
>
>
> Note: Since I dont know the filename, FetchFile processor is not suitable
> for my case.
>
>
>


Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread James McMahon
Thank you very much Matt. I would be most interested in any insights you
gain if you are able to recreate the problem.

If you have a moment, can you offer up a line of code showing how one might
wrap a call around the byte stream to treat the bytes as a string that can
be matched against using, for instance, a compiled re pattern? I will
definitely look more closely at the Oracle docs link you provided. An
example would help me when I tackle this.  -Jim

On Thu, Feb 2, 2017 at 6:56 PM, Matt Burgess  wrote:

> James,
>
> If you'd rather work with the inputStream as bytes, you don't need the
> IOUtils.toString() call, and I'm not sure what a UTF-8 charset would
> do to your mixed data.  You can wrap any of the *InputStream
> decorators around the inputStream object, such as DataInputStream [1]
> to read various data types from the underlying bytes in the stream.
> Alternatively you may want to read all the bytes into an array you can
> work with directly via Jython methods instead of using Java I/O.
>
> What's weird about the TypeError is that it looks like it is calling a
> different write() method than I would've expected, I wonder if the
> translation of Jython to Java objects is somehow making the processor
> not be able to match up a method signature.  If the error is not
> occurring in the redacted code block above, I will give this script a
> try, to see if I can reproduce and/or fix the error.
>
> Regards,
> Matt
>
> [1] https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html
>
>
> On Thu, Feb 2, 2017 at 6:19 PM, James McMahon 
> wrote:
> > This is very helpful Russell, but in my case each file is a mix of data
> > types. So even if i determine that the flowfile is a mix, I'd still have
> to
> > be poised to tackle it it my ExecuteScript script. Good suggestion,
> though,
> > and one I can use in other ways in my workflows.
> >
> > I do hope someone can tell me what I can do in my callback write back to
> > handle all. I'd like to better understand this error I'm getting, too.
> -Jim
> >
> > On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman 
> > wrote:
> >>
> >> Could you use RouteOnContent to determine what sort of content you're
> >> dealing with, then branch to different ExecuteScript processors rigged
> to
> >> different Python scripts?
> >>
> >> Hope this comment is helpful.
> >>
> >>
> >> On 02/02/2017 03:38 PM, James McMahon wrote:
> >>
> >> I have a flowfile that has tagged character information I need to get at
> >> throughout the first few sections of the file. I need to use regex in
> python
> >> to select some of those values and to transform others. I am using an
> >> ExecuteScript processor to execute my python code. Here is my approach:
> >>
> >>
> >>
> >> = = = = =
> >>
> >> class PyStreamCallback(StreamCallback) :
> >>
> >>def __init__ (self) :
> >>
> >>def process(self, inputSteam, outputStream) :
> >>
> >>   stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
> >> what happens to my binary and extreme chars when they get passed through
> >> this step?
> >>
> >>  .
> >>
> >>  . (transform and pick out select content)
> >>
> >>  .
> >>
> >>  outputStream.write(bytearray(stuff.encode(‘utf-8’ # am I
> >> using the wrong functions to put my text chars and my binary and my
> extreme
> >> chars back on the stream as a byte stream? What should I be doing to
> handle
> >> the variety of data?
> >>
> >>
> >>
> >> flowFile = session.get()
> >>
> >> if (flowFile!= None)
> >>
> >>incoming = flowFile.getAttribute(‘filename’)
> >>
> >>logging.info(‘about to process file: %s’, incoming)
> >>
> >>flowFile = session.write(flowFile, PyStreamCallback())   # line 155
> in
> >> my code
> >>
> >>session.transfer(flowFile, REL_SUCCESS)
> >>
> >>session.commit()
> >>
> >>
> >>
> >> = = = = =
> >>
> >>
> >>
> >> When my incoming flowfile is all character content - such as tagged xml
> -
> >> my code works fine. All the flowfiles that also contain some binary data
> >> and/or characters at the extremes such as foreign language characters
> don’t
> >> work. They error out. I suspect it has to do with the way I am writing
> back
> >> to the flowfile stream.
> >>
> >>
> >>
> >> Here is the error I am getting:
> >>
> >> Org.apache.nifi.processor.exception.ProcessException:
> >> javax.script.ScriptException: TypeError: write(): 1st arg can’t be
> coerced
> >> to int, byte[] in 

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread Russell Bateman
There is also a /SplitContent/ processor. Assuming you can recognize the 
boundaries of the different data types, you can split them up into 
separate flowfiles. Then you /MergeContent/ them back together later.



On 02/02/2017 04:19 PM, James McMahon wrote:
This is very helpful Russell, but in my case each file is a mix of 
data types. So even if i determine that the flowfile is a mix, I'd 
still have to be poised to tackle it it my ExecuteScript script. Good 
suggestion, though, and one I can use in other ways in my workflows.


I do hope someone can tell me what I can do in my callback write back 
to handle all. I'd like to better understand this error I'm getting, 
too.  -Jim


On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman > wrote:


Could you use /RouteOnContent/ to determine what sort of content
you're dealing with, then branch to different /ExecuteScript/
processors rigged to different Python scripts?

Hope this comment is helpful.


On 02/02/2017 03:38 PM, James McMahon wrote:


I have a flowfile that has tagged character information I need to
get at throughout the first few sections of the file. I need to
use regex in python to select some of those values and to
transform others. I am using an ExecuteScript processor to
execute my python code. Here is my approach:

= = = = =

class PyStreamCallback(StreamCallback) :

   def __init__ (self) :

   def process(self, inputSteam, outputStream) :

  stuff = IOUtils.toString(inputStream,
StandardCharsets.UTF_8)  # what happens to my binary and extreme
chars when they get passed through this step?

 .

 . (transform and pick out select content)

 .

outputStream.write(bytearray(stuff.encode(‘utf-8’ # am I
using the wrong functions to put my text chars and my binary and
my extreme chars back on the stream as a byte stream? What should
I be doing to handle the variety of data?

flowFile = session.get()

if (flowFile!= None)

   incoming = flowFile.getAttribute(‘filename’)

logging.info (‘about to process file: %s’,
incoming)

   flowFile = session.write(flowFile, PyStreamCallback())   #
line 155 in my code

session.transfer(flowFile, REL_SUCCESS)

   session.commit()

= = = = =

When my incoming flowfile is all character content - such as
tagged xml - my code works fine. All the flowfiles that also
contain some binary data and/or characters at the extremes such
as foreign language characters don’t work. They error out. I
suspect it has to do with the way I am writing back to the
flowfile stream.

Here is the error I am getting:

Org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: write(): 1^st arg can’t
be coerced to int, byte[] in 

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread Matt Burgess
James,

If you'd rather work with the inputStream as bytes, you don't need the
IOUtils.toString() call, and I'm not sure what a UTF-8 charset would
do to your mixed data.  You can wrap any of the *InputStream
decorators around the inputStream object, such as DataInputStream [1]
to read various data types from the underlying bytes in the stream.
Alternatively you may want to read all the bytes into an array you can
work with directly via Jython methods instead of using Java I/O.

What's weird about the TypeError is that it looks like it is calling a
different write() method than I would've expected, I wonder if the
translation of Jython to Java objects is somehow making the processor
not be able to match up a method signature.  If the error is not
occurring in the redacted code block above, I will give this script a
try, to see if I can reproduce and/or fix the error.

Regards,
Matt

[1] https://docs.oracle.com/javase/8/docs/api/java/io/DataInputStream.html


On Thu, Feb 2, 2017 at 6:19 PM, James McMahon  wrote:
> This is very helpful Russell, but in my case each file is a mix of data
> types. So even if i determine that the flowfile is a mix, I'd still have to
> be poised to tackle it it my ExecuteScript script. Good suggestion, though,
> and one I can use in other ways in my workflows.
>
> I do hope someone can tell me what I can do in my callback write back to
> handle all. I'd like to better understand this error I'm getting, too.  -Jim
>
> On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman 
> wrote:
>>
>> Could you use RouteOnContent to determine what sort of content you're
>> dealing with, then branch to different ExecuteScript processors rigged to
>> different Python scripts?
>>
>> Hope this comment is helpful.
>>
>>
>> On 02/02/2017 03:38 PM, James McMahon wrote:
>>
>> I have a flowfile that has tagged character information I need to get at
>> throughout the first few sections of the file. I need to use regex in python
>> to select some of those values and to transform others. I am using an
>> ExecuteScript processor to execute my python code. Here is my approach:
>>
>>
>>
>> = = = = =
>>
>> class PyStreamCallback(StreamCallback) :
>>
>>def __init__ (self) :
>>
>>def process(self, inputSteam, outputStream) :
>>
>>   stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
>> what happens to my binary and extreme chars when they get passed through
>> this step?
>>
>>  .
>>
>>  . (transform and pick out select content)
>>
>>  .
>>
>>  outputStream.write(bytearray(stuff.encode(‘utf-8’ # am I
>> using the wrong functions to put my text chars and my binary and my extreme
>> chars back on the stream as a byte stream? What should I be doing to handle
>> the variety of data?
>>
>>
>>
>> flowFile = session.get()
>>
>> if (flowFile!= None)
>>
>>incoming = flowFile.getAttribute(‘filename’)
>>
>>logging.info(‘about to process file: %s’, incoming)
>>
>>flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in
>> my code
>>
>>session.transfer(flowFile, REL_SUCCESS)
>>
>>session.commit()
>>
>>
>>
>> = = = = =
>>
>>
>>
>> When my incoming flowfile is all character content - such as tagged xml -
>> my code works fine. All the flowfiles that also contain some binary data
>> and/or characters at the extremes such as foreign language characters don’t
>> work. They error out. I suspect it has to do with the way I am writing back
>> to the flowfile stream.
>>
>>
>>
>> Here is the error I am getting:
>>
>> Org.apache.nifi.processor.exception.ProcessException:
>> javax.script.ScriptException: TypeError: write(): 1st arg can’t be coerced
>> to int, byte[] in 

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread James McMahon
This is very helpful Russell, but in my case each file is a mix of data
types. So even if i determine that the flowfile is a mix, I'd still have to
be poised to tackle it it my ExecuteScript script. Good suggestion, though,
and one I can use in other ways in my workflows.

I do hope someone can tell me what I can do in my callback write back to
handle all. I'd like to better understand this error I'm getting, too.
 -Jim

On Thu, Feb 2, 2017 at 6:02 PM, Russell Bateman 
wrote:

> Could you use *RouteOnContent* to determine what sort of content you're
> dealing with, then branch to different *ExecuteScript* processors rigged
> to different Python scripts?
>
> Hope this comment is helpful.
>
>
> On 02/02/2017 03:38 PM, James McMahon wrote:
>
> I have a flowfile that has tagged character information I need to get at
> throughout the first few sections of the file. I need to use regex in
> python to select some of those values and to transform others. I am using
> an ExecuteScript processor to execute my python code. Here is my approach:
>
>
>
> = = = = =
>
> class PyStreamCallback(StreamCallback) :
>
>def __init__ (self) :
>
>def process(self, inputSteam, outputStream) :
>
>   stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  #
> what happens to my binary and extreme chars when they get passed through
> this step?
>
>  .
>
>  . (transform and pick out select content)
>
>  .
>
>  outputStream.write(bytearray(stuff.encode(‘utf-8’ # am I
> using the wrong functions to put my text chars and my binary and my extreme
> chars back on the stream as a byte stream? What should I be doing to handle
> the variety of data?
>
>
>
> flowFile = session.get()
>
> if (flowFile!= None)
>
>incoming = flowFile.getAttribute(‘filename’)
>
>logging.info(‘about to process file: %s’, incoming)
>
>flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in
> my code
>
>session.transfer(flowFile, REL_SUCCESS)
>
>session.commit()
>
>
>
> = = = = =
>
>
>
> When my incoming flowfile is all character content - such as tagged xml -
> my code works fine. All the flowfiles that also contain some binary data
> and/or characters at the extremes such as foreign language characters don’t
> work. They error out. I suspect it has to do with the way I am writing back
> to the flowfile stream.
>
>
>
> Here is the error I am getting:
>
> Org.apache.nifi.processor.exception.ProcessException:
> javax.script.ScriptException: TypeError: write(): 1st arg can’t be
> coerced to int, byte[] in 

Re: Many systems - big flow graphs?

2017-02-02 Thread Joe Witt
Uwe

Happy to have this be a longer dialogue just wanted to get a quick
response back.  You mentioned having unconnected flows separate from
each other.  Yes this is absolutely how it is done for really massive
large scale sets of disparate flows.  The process group concept allows
you to logically create abstractions/groups of flows.  People tend to
name their process groups and processor in meaningful ways.  We have
the search bar in the top-right for fast-find of such components even
if buried very deep in the flow.

This approach works quite well and helps enforce that you keep things
organized and allows you to 'refactor' the flows by moving around and
restructuring groups/etc..

Thanks
Joe

On Thu, Feb 2, 2017 at 5:39 PM, Uwe Geercken  wrote:
> Hello,
>
> excuse my question, but I still have not fully understood how one would
> logically handle large flow graphs. If I have many systems involved and many
> different types of output, would I really put everything in one flow? Or is
> this a misunderstanding on my side? If you put everything in one flow is it
> not getting messy and hard to search and debug? That would have an influence
> on quality at some point in time.
>
> Or would you logically seperate things that do not really belong together
> and run them in flows on different servers - unconnected and seperated from
> the others? Would that be the idea?
>
> I would be happy to hear some of your thoughts or findings from your
> experience.
>
> Rgds,
>
> Uwe
>
>


Many systems - big flow graphs?

2017-02-02 Thread Uwe Geercken
Hello,

 

excuse my question, but I still have not fully understood how one would logically handle large flow graphs. If I have many systems involved and many different types of output, would I really put everything in one flow? Or is this a misunderstanding on my side? If you put everything in one flow is it not getting messy and hard to search and debug? That would have an influence on quality at some point in time.

 

Or would you logically seperate things that do not really belong together and run them in flows on different servers - unconnected and seperated from the others? Would that be the idea?

 

I would be happy to hear some of your thoughts or findings from your experience.

 

Rgds,

 

Uwe

 

 


Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread James McMahon
I have a flowfile that has tagged character information I need to get at
throughout the first few sections of the file. I need to use regex in
python to select some of those values and to transform others. I am using
an ExecuteScript processor to execute my python code. Here is my approach:



= = = = =

class PyStreamCallback(StreamCallback) :

   def __init__ (self) :

   def process(self, inputSteam, outputStream) :

  stuff = IOUtils.toString(inputStream, StandardCharsets.UTF_8)  # what
happens to my binary and extreme chars when they get passed through this
step?

 .

 . (transform and pick out select content)

 .

 outputStream.write(bytearray(stuff.encode(‘utf-8’ # am I using
the wrong functions to put my text chars and my binary and my extreme chars
back on the stream as a byte stream? What should I be doing to handle the
variety of data?



flowFile = session.get()

if (flowFile!= None)

   incoming = flowFile.getAttribute(‘filename’)

   logging.info(‘about to process file: %s’, incoming)

   flowFile = session.write(flowFile, PyStreamCallback())   # line 155 in
my code

   session.transfer(flowFile, REL_SUCCESS)

   session.commit()



= = = = =



When my incoming flowfile is all character content - such as tagged xml -
my code works fine. All the flowfiles that also contain some binary data
and/or characters at the extremes such as foreign language characters don’t
work. They error out. I suspect it has to do with the way I am writing back
to the flowfile stream.



Here is the error I am getting:

Org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: write(): 1st arg can’t be coerced
to int, byte[] in 

Nifi service stopped with no 'stop' request?

2017-02-02 Thread Cheryl Jennings
Hi Everyone,

I had nifi running overnight, communicating between two nodes.  When I
checked in the morning, the nifi service on one of the nodes had stopped.

Normally when issuing a 'service nifi stop', I see an entry in
nifi-bootstrap.log that says:
org.apache.nifi.bootstrap.Command Apache NiFi has accepted the Shutdown
Command and is shutting down now

But this time, I only see this message in nifi-app.log, and no indication
in nifi-bootstrap.log that
a Shutdown Command was issued:
org.apache.nifi.BootstrapListener Received SHUTDOWN request from Bootstrap

Is there any reason nifi would stop like this without an explicit `service
stop|restart`?

Thanks!
-Cheryl


Re: How to send the success status to GetFile processor?

2017-02-02 Thread Oleg Zhurakousky
Prabhu

Not sure I fully understand.
While indeed GetFile does not allow for an incoming connection, it does allow 
for your use case to happen indirectly by monitoring a predefined directory. 
So, one PG finishes and produces a file that is being put into a directory 
monitored by another PG’s GetFile.

Am I missing something?

Cheers
Oleg

On Feb 2, 2017, at 5:48 AM, prabhu Mahendran 
> wrote:

Consider the below scenario:
ProcessGroupA->ProcessGroupB

Since my ProcessgroupA ends with ExecuteProcess processor that runs console 
application and save result into a directory. In ProcessGroupB, I will process 
each file in the saved directory using GetFile processor.

Once, ProcessGroupA is completed I want to run the ProcessgroupB which starts 
with GetFile processor. Since GetFile processor doesnt't have upstream 
connection, I couldn't run the flow here. How to send the success status to 
GetFile processor?

Note: Since I dont know the filename, FetchFile processor is not suitable for 
my case.



How to send the success status to GetFile processor?

2017-02-02 Thread prabhu Mahendran
Consider the below scenario:

ProcessGroupA->ProcessGroupB



Since my ProcessgroupA ends with ExecuteProcess processor that runs console
application and save result into a directory. In ProcessGroupB, I will
process each file in the saved directory using GetFile processor.



Once, ProcessGroupA is completed I want to run the ProcessgroupB which
starts with GetFile processor. Since GetFile processor doesnt't have
upstream connection, I couldn't run the flow here. How to send the success
status to GetFile processor?



Note: Since I dont know the filename, FetchFile processor is not suitable
for my case.