Re: How can datetime to month conversion failed in french language?

2017-04-06 Thread prabhu Mahendran
jeff,

Thanks for your reply.

Attribute 'ds' having the '07/04/2017'.

And  convert that into month using UpdateAttribute.

${ds:toDate('dd/MM/'):format('MMM')}.

if i use that code in windows having language English(India) then it worked.

If i use that code in windows having language French(OS) it couldn't work.

Can you suggest any way to solve that problem?

On Fri, Apr 7, 2017 at 1:28 AM, Jeff  wrote:

> What is the expression language statement that you're attempting to use?
>
> On Thu, Apr 6, 2017 at 3:12 AM prabhu Mahendran 
> wrote:
>
>> In NiFi How JVM Check language of machine?
>>
>> is that take any default language like English(US) else System DateTime
>> Selected language?
>>
>> I face issue while converting datetime format into Month using expression
>> language with NiFi package installed with French OS.
>>
>> But it worked in English(US) Selected language.
>>
>> Can anyone help me to resolve this?
>>
>


Re: Options for increasing performance?

2017-04-06 Thread Andre
Scott,

The slowness if I recall correctly is mostly related to jython initiation
time.

There have been some discussions in  past about this:

https://lists.apache.org/thread.html/a4a56c91e43857df0e2e38797585d0f496f8723b550d6b866f2e28e4@


Cheers


On 6 Apr 2017 06:26, "Scott Wagner"  wrote:

> One of my experiences is that when using ExecuteScript and Python is that
> having an ExecuteScript that works on an individual FlowFile when you have
> multiple in the input queue is very inefficient, even when you set it to a
> timer of 0 sec.
>
> Instead, I have the following in all of my Python scripts:
>
> flowFiles = session.get(10)
> for flowFile in flowFiles:
> if flowFile is None:
> continue
> # Do stuff here
>
> That seems to improve the throughput of the ExecuteScript processor
> dramatically.
>
> YMMV
>
> - Scott
>
> James McMahon 
> Wednesday, April 5, 2017 12:48 PM
> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
> 0.7.x workflow HandleHttpRequest processor. That processor hands the
> flowfile off to an ExecuteScript processor that runs a python script. This
> script is very, very simple: it takes an incoming JSO object and loads it
> into a Python dictionary, and verifies the presence of required fields
> using simple has_key checks on the dictionary. There are only eight fields
> in the incoming JSON object.
>
> The throughput for these two processes is not exceeding 100-150 files in
> five minutes. It seems very slow in light of the minimal processing going
> on in these two steps.
>
> I notice that there are configuration operations seemingly related to
> optimizing performance. "Concurrent tasks", for example,  is only set by
> default to 1 for each processor.
>
> What performance optimizations at the processor level do users recommend?
> Is it advisable to crank up the concurrent tasks for a processor, and is
> there an optimal performance point beyond which you should not crank up
> that value? Are there trade-offs?
>
> I am particularly interested in optimizations for HandleHttpRequest and
> ExecuteScript processors.
>
> Thanks in advance for your thoughts.
>
> cheers,
>
> Jim
>
>
>


Re: Help with SFTP processor

2017-04-06 Thread James Keeney
Thanks.

On Thu, Apr 6, 2017 at 5:24 PM Joe Witt  wrote:

> I just mean while designing/interacting with the flow you can
> start/stop processors, you can click on the connections and say 'list
> queue' and then you can click on each object in the queue and see its
> attributes and content.  Really helps step through the flow at each
> step.
>
> On Thu, Apr 6, 2017 at 5:08 PM, James Keeney  wrote:
> > Thanks again. When you refer to live queue listing and data viewing what
> are
> > you referring to? The dashboard or something else.
> >
> > Jim K.
> >
> > On Thu, Apr 6, 2017 at 4:49 PM Joe Witt  wrote:
> >>
> >> No problem.  Remember you can use live queue listing and data viewing
> >> to see all the attributes we know about the object at each stage.
> >> That is exactly how I figured out how to wire this together and what I
> >> needed from each step.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Apr 6, 2017 at 4:43 PM, James Keeney 
> wrote:
> >> > Thank you. That was the final detail I was not getting. The use of the
> >> > ${path} expression variable. I now see that I needed to look at the
> >> > writes
> >> > attributes section of the ListFile processor.
> >> >
> >> > Jim K.
> >> >
> >> > On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:
> >> >>
> >> >> Jim,
> >> >>
> >> >> Yep I understand your question and how to support that is what I was
> >> >> trying to convey.
> >> >>
> >> >> ListFile should pull from "/home/source".  Lets say it finds that
> >> >> 'home/source/test/newfile.txt file.
> >> >>
> >> >> The resulting flowfile will have an attribute called 'path' that says
> >> >> 'test'
> >> >>
> >> >> Then you use FetchFile to actually pull in the bytes.  'path' still
> >> >> says
> >> >> 'test'
> >> >>
> >> >> Then you use PutSFTP with the 'Remote Path' set to
> "/www/files/${path}"
> >> >>
> >> >> I have just verified that this works myself locally using
> >> >> List/Fetch/PutFile.  In your case you'd use PutSFTP.
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
> >> >> wrote:
> >> >> > Thanks for getting back to me. I will follow up on the
> documentation
> >> >> > Pull
> >> >> > Request.
> >> >> >
> >> >> > As to the directory question, I wasn't specific enough. I've
> already
> >> >> > configured the setting you described.
> >> >> >
> >> >> > Here is what is going on:
> >> >> >
> >> >> > Say the source directory is /home/source and the destination is
> >> >> > /www/files
> >> >> >
> >> >> > This works:
> >> >> >
> >> >> > If a user drops the file text.txt into /home/source then I want
> that
> >> >> > to
> >> >> > be
> >> >> > /www/files/text.txt That is working as expected.
> >> >> >
> >> >> > This does not work
> >> >> >
> >> >> > If a user creates a subdirectory and drop a file, so
> >> >> > /home/source/test/newfile.txt then I want the destination to
> reflect
> >> >> > the
> >> >> > subdirectory as in /www/files/test/newfile.txt But what happens is
> >> >> > the
> >> >> > file
> >> >> > is being placed into the destination directory without the new
> >> >> > subdirectory.
> >> >> > So what is getting created is /www/files/newfile.txt and not
> >> >> > /www/files/test/newfile.txt
> >> >> >
> >> >> > Any suggestions?
> >> >> >
> >> >> > Jim K.
> >> >> >
> >> >> >
> >> >> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt 
> wrote:
> >> >> >>
> >> >> >> Jim,
> >> >> >>
> >> >> >> Glad you've made progress on the SFTP side.  Please file a JIRA
> with
> >> >> >> your suggestions for the docs and the ideal case then is you'd
> file
> >> >> >> a
> >> >> >> Pull Request
> >> >> >> (
> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
> >> >> >> which actually provides the suggested documentation changes.
> >> >> >>
> >> >> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
> >> >> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have
> the
> >> >> >> base directory you need to write to which could be './' or could
> be
> >> >> >> 'some/place/to/write/to' and you'll also want it to reflect the
> >> >> >> directory structure from which you fetched the file locally.  This
> >> >> >> will be available to you from the 'path' attribute of the
> flowfile.
> >> >> >> This is set by the ListFile processor (see writes attributes) [2].
> >> >> >>
> >> >> >> So putting these together you want your PutSFTP processor to have
> as
> >> >> >> a
> >> >> >> value for 'Remote Path' something like
> "thebasedir/fordata/${path}".
> >> >> >>
> >> >> >>
> >> >> >> [1]
> >> >> >>
> >> >> >>
> >> >> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
> >> >> >> [2]
> >> >> >>
> >> >> >>
> >> >> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
> >> >> >>
> >> >> >> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney <
> nextves...@gmail.com>
> >> >> >> wrote:
> >> >> >> > Joe and Juan -
> >> >> >> >
> >> >> >> > Thank you v

Re: Help with SFTP processor

2017-04-06 Thread Joe Witt
I just mean while designing/interacting with the flow you can
start/stop processors, you can click on the connections and say 'list
queue' and then you can click on each object in the queue and see its
attributes and content.  Really helps step through the flow at each
step.

On Thu, Apr 6, 2017 at 5:08 PM, James Keeney  wrote:
> Thanks again. When you refer to live queue listing and data viewing what are
> you referring to? The dashboard or something else.
>
> Jim K.
>
> On Thu, Apr 6, 2017 at 4:49 PM Joe Witt  wrote:
>>
>> No problem.  Remember you can use live queue listing and data viewing
>> to see all the attributes we know about the object at each stage.
>> That is exactly how I figured out how to wire this together and what I
>> needed from each step.
>>
>> Thanks
>> Joe
>>
>> On Thu, Apr 6, 2017 at 4:43 PM, James Keeney  wrote:
>> > Thank you. That was the final detail I was not getting. The use of the
>> > ${path} expression variable. I now see that I needed to look at the
>> > writes
>> > attributes section of the ListFile processor.
>> >
>> > Jim K.
>> >
>> > On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:
>> >>
>> >> Jim,
>> >>
>> >> Yep I understand your question and how to support that is what I was
>> >> trying to convey.
>> >>
>> >> ListFile should pull from "/home/source".  Lets say it finds that
>> >> 'home/source/test/newfile.txt file.
>> >>
>> >> The resulting flowfile will have an attribute called 'path' that says
>> >> 'test'
>> >>
>> >> Then you use FetchFile to actually pull in the bytes.  'path' still
>> >> says
>> >> 'test'
>> >>
>> >> Then you use PutSFTP with the 'Remote Path' set to "/www/files/${path}"
>> >>
>> >> I have just verified that this works myself locally using
>> >> List/Fetch/PutFile.  In your case you'd use PutSFTP.
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
>> >> wrote:
>> >> > Thanks for getting back to me. I will follow up on the documentation
>> >> > Pull
>> >> > Request.
>> >> >
>> >> > As to the directory question, I wasn't specific enough. I've already
>> >> > configured the setting you described.
>> >> >
>> >> > Here is what is going on:
>> >> >
>> >> > Say the source directory is /home/source and the destination is
>> >> > /www/files
>> >> >
>> >> > This works:
>> >> >
>> >> > If a user drops the file text.txt into /home/source then I want that
>> >> > to
>> >> > be
>> >> > /www/files/text.txt That is working as expected.
>> >> >
>> >> > This does not work
>> >> >
>> >> > If a user creates a subdirectory and drop a file, so
>> >> > /home/source/test/newfile.txt then I want the destination to reflect
>> >> > the
>> >> > subdirectory as in /www/files/test/newfile.txt But what happens is
>> >> > the
>> >> > file
>> >> > is being placed into the destination directory without the new
>> >> > subdirectory.
>> >> > So what is getting created is /www/files/newfile.txt and not
>> >> > /www/files/test/newfile.txt
>> >> >
>> >> > Any suggestions?
>> >> >
>> >> > Jim K.
>> >> >
>> >> >
>> >> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:
>> >> >>
>> >> >> Jim,
>> >> >>
>> >> >> Glad you've made progress on the SFTP side.  Please file a JIRA with
>> >> >> your suggestions for the docs and the ideal case then is you'd file
>> >> >> a
>> >> >> Pull Request
>> >> >> (https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
>> >> >> which actually provides the suggested documentation changes.
>> >> >>
>> >> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
>> >> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
>> >> >> base directory you need to write to which could be './' or could be
>> >> >> 'some/place/to/write/to' and you'll also want it to reflect the
>> >> >> directory structure from which you fetched the file locally.  This
>> >> >> will be available to you from the 'path' attribute of the flowfile.
>> >> >> This is set by the ListFile processor (see writes attributes) [2].
>> >> >>
>> >> >> So putting these together you want your PutSFTP processor to have as
>> >> >> a
>> >> >> value for 'Remote Path' something like "thebasedir/fordata/${path}".
>> >> >>
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> >> >> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
>> >> >> [2]
>> >> >>
>> >> >>
>> >> >> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
>> >> >>
>> >> >> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
>> >> >> wrote:
>> >> >> > Joe and Juan -
>> >> >> >
>> >> >> > Thank you very much for the help. It turned out to be the prompt
>> >> >> > for
>> >> >> > verifying the authenticity of the host.
>> >> >> >
>> >> >> > With that fixed, I have a new question:
>> >> >> >
>> >> >> > I'm using ListFile and FetchFile to identify new files as they are
>> >> >> > added
>> >> >> > to
>> >> >> > a directory. When they are I am using SFTP to transfer to another
>> >

Re: Help with SFTP processor

2017-04-06 Thread James Keeney
Thanks again. When you refer to live queue listing and data viewing what
are you referring to? The dashboard or something else.

Jim K.

On Thu, Apr 6, 2017 at 4:49 PM Joe Witt  wrote:

> No problem.  Remember you can use live queue listing and data viewing
> to see all the attributes we know about the object at each stage.
> That is exactly how I figured out how to wire this together and what I
> needed from each step.
>
> Thanks
> Joe
>
> On Thu, Apr 6, 2017 at 4:43 PM, James Keeney  wrote:
> > Thank you. That was the final detail I was not getting. The use of the
> > ${path} expression variable. I now see that I needed to look at the
> writes
> > attributes section of the ListFile processor.
> >
> > Jim K.
> >
> > On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:
> >>
> >> Jim,
> >>
> >> Yep I understand your question and how to support that is what I was
> >> trying to convey.
> >>
> >> ListFile should pull from "/home/source".  Lets say it finds that
> >> 'home/source/test/newfile.txt file.
> >>
> >> The resulting flowfile will have an attribute called 'path' that says
> >> 'test'
> >>
> >> Then you use FetchFile to actually pull in the bytes.  'path' still says
> >> 'test'
> >>
> >> Then you use PutSFTP with the 'Remote Path' set to "/www/files/${path}"
> >>
> >> I have just verified that this works myself locally using
> >> List/Fetch/PutFile.  In your case you'd use PutSFTP.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
> >> wrote:
> >> > Thanks for getting back to me. I will follow up on the documentation
> >> > Pull
> >> > Request.
> >> >
> >> > As to the directory question, I wasn't specific enough. I've already
> >> > configured the setting you described.
> >> >
> >> > Here is what is going on:
> >> >
> >> > Say the source directory is /home/source and the destination is
> >> > /www/files
> >> >
> >> > This works:
> >> >
> >> > If a user drops the file text.txt into /home/source then I want that
> to
> >> > be
> >> > /www/files/text.txt That is working as expected.
> >> >
> >> > This does not work
> >> >
> >> > If a user creates a subdirectory and drop a file, so
> >> > /home/source/test/newfile.txt then I want the destination to reflect
> the
> >> > subdirectory as in /www/files/test/newfile.txt But what happens is the
> >> > file
> >> > is being placed into the destination directory without the new
> >> > subdirectory.
> >> > So what is getting created is /www/files/newfile.txt and not
> >> > /www/files/test/newfile.txt
> >> >
> >> > Any suggestions?
> >> >
> >> > Jim K.
> >> >
> >> >
> >> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:
> >> >>
> >> >> Jim,
> >> >>
> >> >> Glad you've made progress on the SFTP side.  Please file a JIRA with
> >> >> your suggestions for the docs and the ideal case then is you'd file a
> >> >> Pull Request
> >> >> (https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
> >> >> which actually provides the suggested documentation changes.
> >> >>
> >> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
> >> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
> >> >> base directory you need to write to which could be './' or could be
> >> >> 'some/place/to/write/to' and you'll also want it to reflect the
> >> >> directory structure from which you fetched the file locally.  This
> >> >> will be available to you from the 'path' attribute of the flowfile.
> >> >> This is set by the ListFile processor (see writes attributes) [2].
> >> >>
> >> >> So putting these together you want your PutSFTP processor to have as
> a
> >> >> value for 'Remote Path' something like "thebasedir/fordata/${path}".
> >> >>
> >> >>
> >> >> [1]
> >> >>
> >> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
> >> >> [2]
> >> >>
> >> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
> >> >>
> >> >> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
> >> >> wrote:
> >> >> > Joe and Juan -
> >> >> >
> >> >> > Thank you very much for the help. It turned out to be the prompt
> for
> >> >> > verifying the authenticity of the host.
> >> >> >
> >> >> > With that fixed, I have a new question:
> >> >> >
> >> >> > I'm using ListFile and FetchFile to identify new files as they are
> >> >> > added
> >> >> > to
> >> >> > a directory. When they are I am using SFTP to transfer to another
> >> >> > server
> >> >> > behind the firewall. I'd like to be able to preserve the directory
> >> >> > structure
> >> >> > when I transfer the files. I set Create Directory to true but the
> >> >> > SFTP
> >> >> > transfer is always putting the files in the root.
> >> >> >
> >> >> > Any ideas?
> >> >> >
> >> >> > Here is how I resolve the first issue:
> >> >> >
> >> >> > I added the host keys to the known_hosts file and that did the
> trick.
> >> >> >
> >> >> > From a documentation perspective I'd suggest adding a little mor

Re: Help with SFTP processor

2017-04-06 Thread Joe Witt
No problem.  Remember you can use live queue listing and data viewing
to see all the attributes we know about the object at each stage.
That is exactly how I figured out how to wire this together and what I
needed from each step.

Thanks
Joe

On Thu, Apr 6, 2017 at 4:43 PM, James Keeney  wrote:
> Thank you. That was the final detail I was not getting. The use of the
> ${path} expression variable. I now see that I needed to look at the writes
> attributes section of the ListFile processor.
>
> Jim K.
>
> On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:
>>
>> Jim,
>>
>> Yep I understand your question and how to support that is what I was
>> trying to convey.
>>
>> ListFile should pull from "/home/source".  Lets say it finds that
>> 'home/source/test/newfile.txt file.
>>
>> The resulting flowfile will have an attribute called 'path' that says
>> 'test'
>>
>> Then you use FetchFile to actually pull in the bytes.  'path' still says
>> 'test'
>>
>> Then you use PutSFTP with the 'Remote Path' set to "/www/files/${path}"
>>
>> I have just verified that this works myself locally using
>> List/Fetch/PutFile.  In your case you'd use PutSFTP.
>>
>> Thanks
>> Joe
>>
>> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
>> wrote:
>> > Thanks for getting back to me. I will follow up on the documentation
>> > Pull
>> > Request.
>> >
>> > As to the directory question, I wasn't specific enough. I've already
>> > configured the setting you described.
>> >
>> > Here is what is going on:
>> >
>> > Say the source directory is /home/source and the destination is
>> > /www/files
>> >
>> > This works:
>> >
>> > If a user drops the file text.txt into /home/source then I want that to
>> > be
>> > /www/files/text.txt That is working as expected.
>> >
>> > This does not work
>> >
>> > If a user creates a subdirectory and drop a file, so
>> > /home/source/test/newfile.txt then I want the destination to reflect the
>> > subdirectory as in /www/files/test/newfile.txt But what happens is the
>> > file
>> > is being placed into the destination directory without the new
>> > subdirectory.
>> > So what is getting created is /www/files/newfile.txt and not
>> > /www/files/test/newfile.txt
>> >
>> > Any suggestions?
>> >
>> > Jim K.
>> >
>> >
>> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:
>> >>
>> >> Jim,
>> >>
>> >> Glad you've made progress on the SFTP side.  Please file a JIRA with
>> >> your suggestions for the docs and the ideal case then is you'd file a
>> >> Pull Request
>> >> (https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
>> >> which actually provides the suggested documentation changes.
>> >>
>> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
>> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
>> >> base directory you need to write to which could be './' or could be
>> >> 'some/place/to/write/to' and you'll also want it to reflect the
>> >> directory structure from which you fetched the file locally.  This
>> >> will be available to you from the 'path' attribute of the flowfile.
>> >> This is set by the ListFile processor (see writes attributes) [2].
>> >>
>> >> So putting these together you want your PutSFTP processor to have as a
>> >> value for 'Remote Path' something like "thebasedir/fordata/${path}".
>> >>
>> >>
>> >> [1]
>> >>
>> >> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
>> >> [2]
>> >>
>> >> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
>> >>
>> >> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
>> >> wrote:
>> >> > Joe and Juan -
>> >> >
>> >> > Thank you very much for the help. It turned out to be the prompt for
>> >> > verifying the authenticity of the host.
>> >> >
>> >> > With that fixed, I have a new question:
>> >> >
>> >> > I'm using ListFile and FetchFile to identify new files as they are
>> >> > added
>> >> > to
>> >> > a directory. When they are I am using SFTP to transfer to another
>> >> > server
>> >> > behind the firewall. I'd like to be able to preserve the directory
>> >> > structure
>> >> > when I transfer the files. I set Create Directory to true but the
>> >> > SFTP
>> >> > transfer is always putting the files in the root.
>> >> >
>> >> > Any ideas?
>> >> >
>> >> > Here is how I resolve the first issue:
>> >> >
>> >> > I added the host keys to the known_hosts file and that did the trick.
>> >> >
>> >> > From a documentation perspective I'd suggest adding a little more
>> >> > guidance
>> >> > for people who are not familiar with SFTP. It was not clear to me
>> >> > what
>> >> > the
>> >> > two parameters are:
>> >> >
>> >> > Host Key File
>> >> > Private Key Path
>> >> >
>> >> > Since I didn't understand the use of the known_hosts file (using
>> >> > key-scan to
>> >> > get their keys and adding those to the file I had no idea what to put
>> >> > in
>> >> > Host Key File. Also, private key path confused me since it is t

Re: Help with SFTP processor

2017-04-06 Thread James Keeney
Thank you. That was the final detail I was not getting. The use of the
${path} expression variable. I now see that I needed to look at the writes
attributes section of the ListFile processor.

Jim K.

On Thu, Apr 6, 2017 at 2:30 PM Joe Witt  wrote:

> Jim,
>
> Yep I understand your question and how to support that is what I was
> trying to convey.
>
> ListFile should pull from "/home/source".  Lets say it finds that
> 'home/source/test/newfile.txt file.
>
> The resulting flowfile will have an attribute called 'path' that says
> 'test'
>
> Then you use FetchFile to actually pull in the bytes.  'path' still says
> 'test'
>
> Then you use PutSFTP with the 'Remote Path' set to "/www/files/${path}"
>
> I have just verified that this works myself locally using
> List/Fetch/PutFile.  In your case you'd use PutSFTP.
>
> Thanks
> Joe
>
> On Thu, Apr 6, 2017 at 12:51 PM, James Keeney 
> wrote:
> > Thanks for getting back to me. I will follow up on the documentation Pull
> > Request.
> >
> > As to the directory question, I wasn't specific enough. I've already
> > configured the setting you described.
> >
> > Here is what is going on:
> >
> > Say the source directory is /home/source and the destination is
> /www/files
> >
> > This works:
> >
> > If a user drops the file text.txt into /home/source then I want that to
> be
> > /www/files/text.txt That is working as expected.
> >
> > This does not work
> >
> > If a user creates a subdirectory and drop a file, so
> > /home/source/test/newfile.txt then I want the destination to reflect the
> > subdirectory as in /www/files/test/newfile.txt But what happens is the
> file
> > is being placed into the destination directory without the new
> subdirectory.
> > So what is getting created is /www/files/newfile.txt and not
> > /www/files/test/newfile.txt
> >
> > Any suggestions?
> >
> > Jim K.
> >
> >
> > On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:
> >>
> >> Jim,
> >>
> >> Glad you've made progress on the SFTP side.  Please file a JIRA with
> >> your suggestions for the docs and the ideal case then is you'd file a
> >> Pull Request
> >> (https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
> >> which actually provides the suggested documentation changes.
> >>
> >> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
> >> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
> >> base directory you need to write to which could be './' or could be
> >> 'some/place/to/write/to' and you'll also want it to reflect the
> >> directory structure from which you fetched the file locally.  This
> >> will be available to you from the 'path' attribute of the flowfile.
> >> This is set by the ListFile processor (see writes attributes) [2].
> >>
> >> So putting these together you want your PutSFTP processor to have as a
> >> value for 'Remote Path' something like "thebasedir/fordata/${path}".
> >>
> >>
> >> [1]
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
> >> [2]
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
> >>
> >> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
> >> wrote:
> >> > Joe and Juan -
> >> >
> >> > Thank you very much for the help. It turned out to be the prompt for
> >> > verifying the authenticity of the host.
> >> >
> >> > With that fixed, I have a new question:
> >> >
> >> > I'm using ListFile and FetchFile to identify new files as they are
> added
> >> > to
> >> > a directory. When they are I am using SFTP to transfer to another
> server
> >> > behind the firewall. I'd like to be able to preserve the directory
> >> > structure
> >> > when I transfer the files. I set Create Directory to true but the SFTP
> >> > transfer is always putting the files in the root.
> >> >
> >> > Any ideas?
> >> >
> >> > Here is how I resolve the first issue:
> >> >
> >> > I added the host keys to the known_hosts file and that did the trick.
> >> >
> >> > From a documentation perspective I'd suggest adding a little more
> >> > guidance
> >> > for people who are not familiar with SFTP. It was not clear to me what
> >> > the
> >> > two parameters are:
> >> >
> >> > Host Key File
> >> > Private Key Path
> >> >
> >> > Since I didn't understand the use of the known_hosts file (using
> >> > key-scan to
> >> > get their keys and adding those to the file I had no idea what to put
> in
> >> > Host Key File. Also, private key path confused me since it is the
> public
> >> > key
> >> > that is being shared. Also, we might want to highlight that the SFTP
> >> > process
> >> > needs to go forward without prompts. I had gotten use to the prompt so
> >> > when
> >> > I tested I didn't even think about that issue.
> >> >
> >> > How might I go about adding my sweat equity to update the
> documentation
> >> > to
> >> > make it a little clearer? Let me know and I will take a crack at
> >> > expandign
> >> > the information to help other use

Re: Is it possible to dynamically spawn a Processor Group ?

2017-04-06 Thread Stephen-Talk
Thanks for the quick reply.

Yes, that is quite correct.
The scenario is the following:

The input flow is a "GetFile" process that collects csv files
(>100,000 lines) which in turn queues the file and parses each line to a
locally built processor (MyImportProcessor say) that submits them via
the REST API to a Drupal website.
The process works fine, but it is very slow, and would like to speed it
up by splitting the csv file into chunks so that it can then spawn
"MyImportProcessor" as many times as required.


On 06/04/2017 20:47, Jeff wrote:
> Hello Stephen,
> 
> It's possible to watch the status of NiFi, and upon observing a
> particular status in which you're interested, you can use the REST API
> to create new processor groups.  You'd also have to populate that
> processor group with processors and other components.  Based on the
> scenario you mentioned, though, it sounds like you are looking at being
> able to scale up available processing (via more concurrent threads, or
> more nodes in a cluster) once a certain amount of data is queued up and
> waiting to be processed, rather than adding components to the existing
> flow.  Is that correct?
> 
> On Thu, Apr 6, 2017 at 3:30 PM Stephen-Talk
> mailto:stephen.schem...@talktalk.net>>
> wrote:
> 
> Hi, I am just a Nifi Inquisitor,
> 
> Is it, or could it be possible to Dynamically spawn a "Processor Group"
> when the input flow reaches a certain threshold.
> 
> Thanking you in aniticipation.
> Stephen
> 


Re: How can datetime to month conversion failed in french language?

2017-04-06 Thread Jeff
What is the expression language statement that you're attempting to use?

On Thu, Apr 6, 2017 at 3:12 AM prabhu Mahendran 
wrote:

> In NiFi How JVM Check language of machine?
>
> is that take any default language like English(US) else System DateTime
> Selected language?
>
> I face issue while converting datetime format into Month using expression
> language with NiFi package installed with French OS.
>
> But it worked in English(US) Selected language.
>
> Can anyone help me to resolve this?
>


Re: Is it possible to dynamically spawn a Processor Group ?

2017-04-06 Thread Jeff
Hello Stephen,

It's possible to watch the status of NiFi, and upon observing a particular
status in which you're interested, you can use the REST API to create new
processor groups.  You'd also have to populate that processor group with
processors and other components.  Based on the scenario you mentioned,
though, it sounds like you are looking at being able to scale up available
processing (via more concurrent threads, or more nodes in a cluster) once a
certain amount of data is queued up and waiting to be processed, rather
than adding components to the existing flow.  Is that correct?

On Thu, Apr 6, 2017 at 3:30 PM Stephen-Talk 
wrote:

> Hi, I am just a Nifi Inquisitor,
>
> Is it, or could it be possible to Dynamically spawn a "Processor Group"
> when the input flow reaches a certain threshold.
>
> Thanking you in aniticipation.
> Stephen
>


Is it possible to dynamically spawn a Processor Group ?

2017-04-06 Thread Stephen-Talk
Hi, I am just a Nifi Inquisitor,

Is it, or could it be possible to Dynamically spawn a "Processor Group"
when the input flow reaches a certain threshold.

Thanking you in aniticipation.
Stephen


Re: Help with SFTP processor

2017-04-06 Thread Joe Witt
Jim,

Yep I understand your question and how to support that is what I was
trying to convey.

ListFile should pull from "/home/source".  Lets say it finds that
'home/source/test/newfile.txt file.

The resulting flowfile will have an attribute called 'path' that says 'test'

Then you use FetchFile to actually pull in the bytes.  'path' still says 'test'

Then you use PutSFTP with the 'Remote Path' set to "/www/files/${path}"

I have just verified that this works myself locally using
List/Fetch/PutFile.  In your case you'd use PutSFTP.

Thanks
Joe

On Thu, Apr 6, 2017 at 12:51 PM, James Keeney  wrote:
> Thanks for getting back to me. I will follow up on the documentation Pull
> Request.
>
> As to the directory question, I wasn't specific enough. I've already
> configured the setting you described.
>
> Here is what is going on:
>
> Say the source directory is /home/source and the destination is /www/files
>
> This works:
>
> If a user drops the file text.txt into /home/source then I want that to be
> /www/files/text.txt That is working as expected.
>
> This does not work
>
> If a user creates a subdirectory and drop a file, so
> /home/source/test/newfile.txt then I want the destination to reflect the
> subdirectory as in /www/files/test/newfile.txt But what happens is the file
> is being placed into the destination directory without the new subdirectory.
> So what is getting created is /www/files/newfile.txt and not
> /www/files/test/newfile.txt
>
> Any suggestions?
>
> Jim K.
>
>
> On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:
>>
>> Jim,
>>
>> Glad you've made progress on the SFTP side.  Please file a JIRA with
>> your suggestions for the docs and the ideal case then is you'd file a
>> Pull Request
>> (https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
>> which actually provides the suggested documentation changes.
>>
>> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
>> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
>> base directory you need to write to which could be './' or could be
>> 'some/place/to/write/to' and you'll also want it to reflect the
>> directory structure from which you fetched the file locally.  This
>> will be available to you from the 'path' attribute of the flowfile.
>> This is set by the ListFile processor (see writes attributes) [2].
>>
>> So putting these together you want your PutSFTP processor to have as a
>> value for 'Remote Path' something like "thebasedir/fordata/${path}".
>>
>>
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
>> [2]
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
>>
>> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
>> wrote:
>> > Joe and Juan -
>> >
>> > Thank you very much for the help. It turned out to be the prompt for
>> > verifying the authenticity of the host.
>> >
>> > With that fixed, I have a new question:
>> >
>> > I'm using ListFile and FetchFile to identify new files as they are added
>> > to
>> > a directory. When they are I am using SFTP to transfer to another server
>> > behind the firewall. I'd like to be able to preserve the directory
>> > structure
>> > when I transfer the files. I set Create Directory to true but the SFTP
>> > transfer is always putting the files in the root.
>> >
>> > Any ideas?
>> >
>> > Here is how I resolve the first issue:
>> >
>> > I added the host keys to the known_hosts file and that did the trick.
>> >
>> > From a documentation perspective I'd suggest adding a little more
>> > guidance
>> > for people who are not familiar with SFTP. It was not clear to me what
>> > the
>> > two parameters are:
>> >
>> > Host Key File
>> > Private Key Path
>> >
>> > Since I didn't understand the use of the known_hosts file (using
>> > key-scan to
>> > get their keys and adding those to the file I had no idea what to put in
>> > Host Key File. Also, private key path confused me since it is the public
>> > key
>> > that is being shared. Also, we might want to highlight that the SFTP
>> > process
>> > needs to go forward without prompts. I had gotten use to the prompt so
>> > when
>> > I tested I didn't even think about that issue.
>> >
>> > How might I go about adding my sweat equity to update the documentation
>> > to
>> > make it a little clearer? Let me know and I will take a crack at
>> > expandign
>> > the information to help other users.
>> >
>> > Thanks.
>> >
>> > Jim K.
>> >
>> > On Tue, Apr 4, 2017 at 12:31 PM Joe Witt  wrote:
>> >>
>> >> definitely agree with Juan's suggestion to get more details on what
>> >> the actual authentication process is when trying ssh -vvv.  Also, be
>> >> sure to check what order of authorization occurs.  It is possible it
>> >> is trying keyboard-interactive before the certs and this could create
>> >> problems so ordering there, on the server side, will really matter.
>> >>
>> >> On Tue, Apr 4, 2017 at 12:27 P

Re: nifi attributes logics

2017-04-06 Thread Pompilio Ramirez
Thank you Andy.

We have n number of data points that send data to us to normalize and our
end points have different normalized requirements.
So we are also juggling ease of dataflow configuration ( many people
implementing dataflows ) so trying to keep things simple and have only one
point in which a DFM needs to go to configure a dataflow.

So dataFromSiteX comes in to our system.

That data is X format  we "normalize" it through identify Mime type in
to a standard format we have. And then we route based on who our end
customers are and their requirements ( zip'd / merged to X size /
proprietary stuff ).

We are trying to find the most efficient method that balances dataflow
configuration simplicity ( CM / DFM ) versus technical efficiency.
To meet the req we could certainly accomplish it using multiple update
routes on different pieces of the dataflow, but then a DFM would have to
know that they need to configure many UpdateAttributes.

We are going to do something similar to your suggestion and probably extend
the route on attribute code, we are also going to create an external "U/I
or script or something" that will allow a DFM to configure a dataflow and
that way we can "validate" the initial dataflow configuration, since we
could have many attributes and settings that we want to pre-populate one
time and in one place. It feels weird to have a separate UI that populates
the "nifi UI" but validating a dataflow through the dataflow lifecycle
builds extra complexity for CM / Tier 1 troubleshooting.


On Thu, Apr 6, 2017 at 1:20 PM Andy LoPresto  wrote:

> Hi,
>
> I’m not sure I fully understand the question, as you say that “customers”
> have attributes, but I was under the impression that the customers were
> various processors/“endpoints”. Flowfiles are the atomic units of data
> passing through the flow, and that is where attributes are stored.
>
> Regardless, to perform complex string comparisons, I believe your best
> option is an ExecuteScript processor. In any of the supported languages,
> you can quickly parse the attributes into collections (i.e. java.util.List)
> via String split on delimiter, perform set arithmetic (A - B), and then
> route based on the results.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 6, 2017, at 7:58 AM, Pompilio Ramirez  wrote:
>
> Hello,
>
> I cant find a way to take 2 attributes and create an attribute with only
> the difference between them.
>
> Has anyone accomplished that? In general want to define routing to
> processing groups that do things. I want to define end points ( say
> customer1 / customer2  / customer3 )
> I get one flowfile and I define what the customer(s) need to normalize the
> data using advance update attribute.
>
> So flowfile condition is met and I build attributes under actions:
>
> So I enrich flowfile with attribute:
>
> So customer1 would have an attribute "ACTIONS" that says ( convert / merge
> / enrich / convert )
> Customer2 has "ACTIONS" ( convert / merge / gzip )
> Customer3 has "ACTIONS" ( convert / merge  )
>
> I then want to take that flowfile and have logic that says all customers
> need convert / merge so I'll take that flowfile and do convert ( this is
> very expensive ) ( or do all the common steps as one flowfile ) before they
> clone.
>
> Knowing my steps are linear so just want to find a way to build logic to
> know my common steps and create attributes with the difference.
>
> thoughts?
>
>
>


Re: ExtractMediaMetadata temp files

2017-04-06 Thread Joe Skora
Dan,

You are welcome.

You can track it on NIFI-3677
.

Regards,
Joe

On Thu, Apr 6, 2017 at 1:32 PM, Dan Morris  wrote:

> Thanks!
>
>
> Thanks,
>
> Dan Morris
> 443-992-2848 <(443)%20992-2848>
>
> On Thu, Apr 6, 2017 at 1:25 PM, Joe Skora  wrote:
>
>> Dan,
>>
>> Good find, this looks like a bug in ExtractMediaMetadata.
>>
>> I'm creating a Jira ticket to fix it.  I'll reply with the ticket number
>> once created.
>>
>> Regards,
>> JoeS
>>
>> On Thu, Apr 6, 2017 at 1:01 PM, Dan Morris  wrote:
>>
>>> Hello,
>>>
>>> I'm using the ExtractMediaMetadata to make use of apache tika to
>>> validate content types and metadata.   It appears that this processor is
>>> filling up my /tmp folder with apache-tika-.tmp
>>> files.
>>>
>>> Is there a way to get the processor or nifi to clean up these tmp files?
>>>
>>> I'm currently using nifi-0.7.0
>>>
>>> Thanks,
>>>
>>> Dan Morris
>>>
>>
>>
>


Re: ExtractMediaMetadata temp files

2017-04-06 Thread Dan Morris
Thanks!


Thanks,

Dan Morris
443-992-2848

On Thu, Apr 6, 2017 at 1:25 PM, Joe Skora  wrote:

> Dan,
>
> Good find, this looks like a bug in ExtractMediaMetadata.
>
> I'm creating a Jira ticket to fix it.  I'll reply with the ticket number
> once created.
>
> Regards,
> JoeS
>
> On Thu, Apr 6, 2017 at 1:01 PM, Dan Morris  wrote:
>
>> Hello,
>>
>> I'm using the ExtractMediaMetadata to make use of apache tika to validate
>> content types and metadata.   It appears that this processor is filling up
>> my /tmp folder with apache-tika-.tmp files.
>>
>> Is there a way to get the processor or nifi to clean up these tmp files?
>>
>> I'm currently using nifi-0.7.0
>>
>> Thanks,
>>
>> Dan Morris
>>
>
>


Re: ExtractMediaMetadata temp files

2017-04-06 Thread Joe Skora
Dan,

Good find, this looks like a bug in ExtractMediaMetadata.

I'm creating a Jira ticket to fix it.  I'll reply with the ticket number
once created.

Regards,
JoeS

On Thu, Apr 6, 2017 at 1:01 PM, Dan Morris  wrote:

> Hello,
>
> I'm using the ExtractMediaMetadata to make use of apache tika to validate
> content types and metadata.   It appears that this processor is filling up
> my /tmp folder with apache-tika-.tmp files.
>
> Is there a way to get the processor or nifi to clean up these tmp files?
>
> I'm currently using nifi-0.7.0
>
> Thanks,
>
> Dan Morris
>


Re: nifi attributes logics

2017-04-06 Thread Andy LoPresto
Hi,

I’m not sure I fully understand the question, as you say that “customers” have 
attributes, but I was under the impression that the customers were various 
processors/“endpoints”. Flowfiles are the atomic units of data passing through 
the flow, and that is where attributes are stored.

Regardless, to perform complex string comparisons, I believe your best option 
is an ExecuteScript processor. In any of the supported languages, you can 
quickly parse the attributes into collections (i.e. java.util.List) via String 
split on delimiter, perform set arithmetic (A - B), and then route based on the 
results.

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 6, 2017, at 7:58 AM, Pompilio Ramirez  wrote:
> 
> Hello,
> 
> I cant find a way to take 2 attributes and create an attribute with only the 
> difference between them.
> 
> Has anyone accomplished that? In general want to define routing to processing 
> groups that do things. I want to define end points ( say customer1 / 
> customer2  / customer3 )
> I get one flowfile and I define what the customer(s) need to normalize the 
> data using advance update attribute.
> 
> So flowfile condition is met and I build attributes under actions:
> 
> So I enrich flowfile with attribute:
> 
> So customer1 would have an attribute "ACTIONS" that says ( convert / merge / 
> enrich / convert )
> Customer2 has "ACTIONS" ( convert / merge / gzip )
> Customer3 has "ACTIONS" ( convert / merge  )
> 
> I then want to take that flowfile and have logic that says all customers need 
> convert / merge so I'll take that flowfile and do convert ( this is very 
> expensive ) ( or do all the common steps as one flowfile ) before they clone.
> 
> Knowing my steps are linear so just want to find a way to build logic to know 
> my common steps and create attributes with the difference.
> 
> thoughts?



signature.asc
Description: Message signed with OpenPGP using GPGMail


ExtractMediaMetadata temp files

2017-04-06 Thread Dan Morris
Hello,

I'm using the ExtractMediaMetadata to make use of apache tika to validate
content types and metadata.   It appears that this processor is filling up
my /tmp folder with apache-tika-.tmp files.

Is there a way to get the processor or nifi to clean up these tmp files?

I'm currently using nifi-0.7.0

Thanks,

Dan Morris


Re: Help with SFTP processor

2017-04-06 Thread James Keeney
Thanks for getting back to me. I will follow up on the documentation Pull
Request.

As to the directory question, I wasn't specific enough. I've already
configured the setting you described.

Here is what is going on:

Say the source directory is /home/source and the destination is /www/files

This works:

If a user drops the file text.txt into /home/source then I want that to be
/www/files/text.txt That is working as expected.

This does not work

If a user creates a subdirectory and drop a file, so
/home/source/test/newfile.txt then I want the destination to reflect the
subdirectory as in /www/files/test/newfile.txt But what happens is the file
is being placed into the destination directory without the new
subdirectory. So what is getting created is /www/files/newfile.txt and not
/www/files/test/newfile.txt

Any suggestions?

Jim K.


On Thu, Apr 6, 2017 at 12:19 PM Joe Witt  wrote:

> Jim,
>
> Glad you've made progress on the SFTP side.  Please file a JIRA with
> your suggestions for the docs and the ideal case then is you'd file a
> Pull Request (
> https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
> which actually provides the suggested documentation changes.
>
> For the ListFile/FetchFile -> PutSFTP[1] side the key property on
> PutSFTP to set is 'Remote Path'.  You'll want this value to have the
> base directory you need to write to which could be './' or could be
> 'some/place/to/write/to' and you'll also want it to reflect the
> directory structure from which you fetched the file locally.  This
> will be available to you from the 'path' attribute of the flowfile.
> This is set by the ListFile processor (see writes attributes) [2].
>
> So putting these together you want your PutSFTP processor to have as a
> value for 'Remote Path' something like "thebasedir/fordata/${path}".
>
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
> [2]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html
>
> On Thu, Apr 6, 2017 at 11:43 AM, James Keeney 
> wrote:
> > Joe and Juan -
> >
> > Thank you very much for the help. It turned out to be the prompt for
> > verifying the authenticity of the host.
> >
> > With that fixed, I have a new question:
> >
> > I'm using ListFile and FetchFile to identify new files as they are added
> to
> > a directory. When they are I am using SFTP to transfer to another server
> > behind the firewall. I'd like to be able to preserve the directory
> structure
> > when I transfer the files. I set Create Directory to true but the SFTP
> > transfer is always putting the files in the root.
> >
> > Any ideas?
> >
> > Here is how I resolve the first issue:
> >
> > I added the host keys to the known_hosts file and that did the trick.
> >
> > From a documentation perspective I'd suggest adding a little more
> guidance
> > for people who are not familiar with SFTP. It was not clear to me what
> the
> > two parameters are:
> >
> > Host Key File
> > Private Key Path
> >
> > Since I didn't understand the use of the known_hosts file (using
> key-scan to
> > get their keys and adding those to the file I had no idea what to put in
> > Host Key File. Also, private key path confused me since it is the public
> key
> > that is being shared. Also, we might want to highlight that the SFTP
> process
> > needs to go forward without prompts. I had gotten use to the prompt so
> when
> > I tested I didn't even think about that issue.
> >
> > How might I go about adding my sweat equity to update the documentation
> to
> > make it a little clearer? Let me know and I will take a crack at
> expandign
> > the information to help other users.
> >
> > Thanks.
> >
> > Jim K.
> >
> > On Tue, Apr 4, 2017 at 12:31 PM Joe Witt  wrote:
> >>
> >> definitely agree with Juan's suggestion to get more details on what
> >> the actual authentication process is when trying ssh -vvv.  Also, be
> >> sure to check what order of authorization occurs.  It is possible it
> >> is trying keyboard-interactive before the certs and this could create
> >> problems so ordering there, on the server side, will really matter.
> >>
> >> On Tue, Apr 4, 2017 at 12:27 PM, Juan Sequeiros 
> >> wrote:
> >> > Good afternoon,
> >> >
> >> > I would try this command from command line:
> >> >
> >> > ssh -vvv -i  user@server
> >> >
> >> > Example:
> >> >
> >> > ssh -vvv -i /some/path/.ssh/id_rsa nifi@10.10.10.10
> >> >
> >> > If that works then I would double check the "private key path"
> property
> >> > of
> >> > your GetSFTP it should point to the fully qualified file to the
> private
> >> > key
> >> > path.
> >> >
> >> > If it does not work then the -vvv option should give you more error.
> >> >
> >> >
> >> >
> >> > On Tue, Apr 4, 2017 at 9:47 AM James Keeney 
> >> > wrote:
> >> >>
> >> >> I am using SFTP to transfer files between two servers. I have tried
> >> >> multiple configurations to try to get the authentication to

Re: Multiple log entries from InvokeScriptedProcessor initialize() ?

2017-04-06 Thread James McMahon
Hi. Wanted to close the loop in case it might help someone in the future
who has a similar interest. This is how I got logging to work from an
InvokeScriptedProcessor processor to a specific log file of my choosing,
executed from a Jython script (some hybrid of java and python that I don't
yet fully understand). I'm running NiFi 0.7.x, and I'm running Python 2.6.6.



Please do take my stuff with a grain of salt. Beyond dogged persistence I
don't have much experience going for me.



I built this Frankenstein monster with:

o initial guidance from Matt B. and Joe W. of this Nifi users group,

o a NiFi example in Git (
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/test/resources/jython/test_update_attribute.py
),

o this stackoverflow discussion (
http://stackoverflow.com/questions/6729268/python-logging-messages-appearing-twice
)

and

o a PySet problem solution from Michael K. available at the Hortonworks
site (
https://community.hortonworks.com/questions/75420/invokescriptedprocessor-in-python.html
).  A big thank you for your assistance.



-Jim



import sys

import traceback

import logging

from org.apache.nifi.processor import Processor

from org.apache.nifi.processor import Relationship

from org.apache.nifi.components import PropertyDescriptor

from org.apache.nifi.processor.util import StandardValidators

from org.python.core import PySet



class UpdateAttributes(Processor) :



def __init__(self) :

self.__rel_success =
Relationship.Builder().name("success").description("Success").build()



def initialize(self, context) :

try :

# create a logger
associated with this InvokeScriptedProcessor…

self.logger =
logging.getLogger('nifi_ISP_1')


self.logger.setLevel(logging.DEBUG)



# DON'T create your logging
file handler here because it must be established, used,

# and discarded with each
processing cycle. Else we get all sorts of wacky multiples

# of outputs in our log
file.

except :

pass



def getRelationships(self) :

return PySet([self.__rel_success])



def validate(self, context) :

return None



def getPropertyDescriptor(self) :

return None



def getPropertyDescriptors(self) :

emptyList=[]

return emptyList



def onPropertyModified(self, descriptor, newValue,
oldValue) :

pass



def onTrigger(self, context, sessionFactory) :

session = sessionFactory.createSession()

try :

# ensure we have some work
to do (TBD: try grabbing many files at once rather than one at a time

#   to optimize
throughput)

flowfile = session.get()

if flowfile is None :

return



# Establish a file handler
that logs even debug messages…


fh.logging.FileHandler('/home.nifi/latest/logs/ISP.log')

fh.setLevel(logging.DEBUG)

# Create a formatter and
add it to the handler…

formatter =
logging.Formatter('%(asctime)-15s %(message)s')

fh.setFormatter(formatter)

self.logger.addHandler(fh)

self.logger.info('About to
process file %s',flowfile.getAttribute("filename"))



# Extract an attribute of
interest…

# fromPropertyValue =
context.getProperty("for-attributes").getValue()

fromAttributeValue =
flowfile.getAttribute("filename")



# Set the attribute to a
new value…

# flowfile =
session.putAttribute(flowfile, "from-property", fromPropertyValue)

flowfile =
session.putAttribute(flowfile, "filename", "Larry_Curley_Moe.txt")

self.logger.info('File
renamed to %s',flowfile.getAttribute("filename"))



 

Re: Help with SFTP processor

2017-04-06 Thread Joe Witt
Jim,

Glad you've made progress on the SFTP side.  Please file a JIRA with
your suggestions for the docs and the ideal case then is you'd file a
Pull Request 
(https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide)
which actually provides the suggested documentation changes.

For the ListFile/FetchFile -> PutSFTP[1] side the key property on
PutSFTP to set is 'Remote Path'.  You'll want this value to have the
base directory you need to write to which could be './' or could be
'some/place/to/write/to' and you'll also want it to reflect the
directory structure from which you fetched the file locally.  This
will be available to you from the 'path' attribute of the flowfile.
This is set by the ListFile processor (see writes attributes) [2].

So putting these together you want your PutSFTP processor to have as a
value for 'Remote Path' something like "thebasedir/fordata/${path}".


[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.PutSFTP/index.html
[2] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ListFile/index.html

On Thu, Apr 6, 2017 at 11:43 AM, James Keeney  wrote:
> Joe and Juan -
>
> Thank you very much for the help. It turned out to be the prompt for
> verifying the authenticity of the host.
>
> With that fixed, I have a new question:
>
> I'm using ListFile and FetchFile to identify new files as they are added to
> a directory. When they are I am using SFTP to transfer to another server
> behind the firewall. I'd like to be able to preserve the directory structure
> when I transfer the files. I set Create Directory to true but the SFTP
> transfer is always putting the files in the root.
>
> Any ideas?
>
> Here is how I resolve the first issue:
>
> I added the host keys to the known_hosts file and that did the trick.
>
> From a documentation perspective I'd suggest adding a little more guidance
> for people who are not familiar with SFTP. It was not clear to me what the
> two parameters are:
>
> Host Key File
> Private Key Path
>
> Since I didn't understand the use of the known_hosts file (using key-scan to
> get their keys and adding those to the file I had no idea what to put in
> Host Key File. Also, private key path confused me since it is the public key
> that is being shared. Also, we might want to highlight that the SFTP process
> needs to go forward without prompts. I had gotten use to the prompt so when
> I tested I didn't even think about that issue.
>
> How might I go about adding my sweat equity to update the documentation to
> make it a little clearer? Let me know and I will take a crack at expandign
> the information to help other users.
>
> Thanks.
>
> Jim K.
>
> On Tue, Apr 4, 2017 at 12:31 PM Joe Witt  wrote:
>>
>> definitely agree with Juan's suggestion to get more details on what
>> the actual authentication process is when trying ssh -vvv.  Also, be
>> sure to check what order of authorization occurs.  It is possible it
>> is trying keyboard-interactive before the certs and this could create
>> problems so ordering there, on the server side, will really matter.
>>
>> On Tue, Apr 4, 2017 at 12:27 PM, Juan Sequeiros 
>> wrote:
>> > Good afternoon,
>> >
>> > I would try this command from command line:
>> >
>> > ssh -vvv -i  user@server
>> >
>> > Example:
>> >
>> > ssh -vvv -i /some/path/.ssh/id_rsa nifi@10.10.10.10
>> >
>> > If that works then I would double check the "private key path" property
>> > of
>> > your GetSFTP it should point to the fully qualified file to the private
>> > key
>> > path.
>> >
>> > If it does not work then the -vvv option should give you more error.
>> >
>> >
>> >
>> > On Tue, Apr 4, 2017 at 9:47 AM James Keeney 
>> > wrote:
>> >>
>> >> I am using SFTP to transfer files between two servers. I have tried
>> >> multiple configurations to try to get the authentication to work but i
>> >> keep
>> >> getting the Auth Fail error. I'm able to go onto the Nifi server sftp
>> >> over
>> >> to the destination server but I cannot get it to work in Nifi. I'm just
>> >> not
>> >> sure how to debug this so I was hoping someone could help. Here are the
>> >> setting I'm using (I've replaced all the important details with
>> >> placeholders):
>> >>
>> >> Hostname: 
>> >> Port: 22 > >> server)
>> >> username: 
>> >> Host key File: 
>> >>
>> >> The error I am receiving is below. Any help would be greatly
>> >> appreciated.
>> >>
>> >> Thanks.
>> >>
>> >> Jim K
>> >>
>> >>
>> >>
>> >>
>> >> 2017-03-29 13:59:51,705 ERROR [Timer-Driven Process Thread-1]
>> >> o.a.nifi.processors.standard.PutSFTP
>> >> PutSFTP[id=fcdd2eb4-015a-1000-80c5-7406e6fca4c5] Unable to transfer
>> >>
>> >> StandardFlowFileRecord[uuid=a3569afa-7c80-4cec-9239-2424172e30d1,claim=StandardContentClaim
>> >> [resourceClaim=StandardResourceClaim[id=1490795991637-151,
>> >> container=default, section=151], offset=0,
>> >> length=2917388],offset=0,name=tulips_248_042214.jpg,size=2917388] to
>> >> remote
>> >> host  due to

Re: Help with SFTP processor

2017-04-06 Thread James Keeney
Joe and Juan -

Thank you very much for the help. It turned out to be the prompt for
verifying the authenticity of the host.

With that fixed, I have a new question:

I'm using ListFile and FetchFile to identify new files as they are added to
a directory. When they are I am using SFTP to transfer to another server
behind the firewall. I'd like to be able to preserve the directory
structure when I transfer the files. I set Create Directory to true but the
SFTP transfer is always putting the files in the root.

Any ideas?

Here is how I resolve the first issue:

I added the host keys to the known_hosts file and that did the trick.

>From a documentation perspective I'd suggest adding a little more guidance
for people who are not familiar with SFTP. It was not clear to me what the
two parameters are:

Host Key File
Private Key Path

Since I didn't understand the use of the known_hosts file (using key-scan
to get their keys and adding those to the file I had no idea what to put in
Host Key File. Also, private key path confused me since it is the public
key that is being shared. Also, we might want to highlight that the SFTP
process needs to go forward without prompts. I had gotten use to the prompt
so when I tested I didn't even think about that issue.

How might I go about adding my sweat equity to update the documentation to
make it a little clearer? Let me know and I will take a crack at expandign
the information to help other users.

Thanks.

Jim K.

On Tue, Apr 4, 2017 at 12:31 PM Joe Witt  wrote:

> definitely agree with Juan's suggestion to get more details on what
> the actual authentication process is when trying ssh -vvv.  Also, be
> sure to check what order of authorization occurs.  It is possible it
> is trying keyboard-interactive before the certs and this could create
> problems so ordering there, on the server side, will really matter.
>
> On Tue, Apr 4, 2017 at 12:27 PM, Juan Sequeiros 
> wrote:
> > Good afternoon,
> >
> > I would try this command from command line:
> >
> > ssh -vvv -i  user@server
> >
> > Example:
> >
> > ssh -vvv -i /some/path/.ssh/id_rsa nifi@10.10.10.10
> >
> > If that works then I would double check the "private key path" property
> of
> > your GetSFTP it should point to the fully qualified file to the private
> key
> > path.
> >
> > If it does not work then the -vvv option should give you more error.
> >
> >
> >
> > On Tue, Apr 4, 2017 at 9:47 AM James Keeney 
> wrote:
> >>
> >> I am using SFTP to transfer files between two servers. I have tried
> >> multiple configurations to try to get the authentication to work but i
> keep
> >> getting the Auth Fail error. I'm able to go onto the Nifi server sftp
> over
> >> to the destination server but I cannot get it to work in Nifi. I'm just
> not
> >> sure how to debug this so I was hoping someone could help. Here are the
> >> setting I'm using (I've replaced all the important details with
> >> placeholders):
> >>
> >> Hostname: 
> >> Port: 22  >> server)
> >> username: 
> >> Host key File: 
> >>
> >> The error I am receiving is below. Any help would be greatly
> appreciated.
> >>
> >> Thanks.
> >>
> >> Jim K
> >>
> >>
> >>
> >>
> >> 2017-03-29 13:59:51,705 ERROR [Timer-Driven Process Thread-1]
> >> o.a.nifi.processors.standard.PutSFTP
> >> PutSFTP[id=fcdd2eb4-015a-1000-80c5-7406e6fca4c5] Unable to transfer
> >>
> StandardFlowFileRecord[uuid=a3569afa-7c80-4cec-9239-2424172e30d1,claim=StandardContentClaim
> >> [resourceClaim=StandardResourceClaim[id=1490795991637-151,
> >> container=default, section=151], offset=0,
> >> length=2917388],offset=0,name=tulips_248_042214.jpg,size=2917388] to
> remote
> >> host  due to
> >> org.apache.nifi.processor.exception.ProcessException: IOException thrown
> >> from PutSFTP[id=fcdd2eb4-015a-1000-80c5-7406e6fca4c5]:
> java.io.IOException:
> >> Failed to obtain connection to remote host due to
> >> com.jcraft.jsch.JSchException: Auth fail: java.io.IOException: Failed to
> >> obtain connection to remote host due to com.jcraft.jsch.JSchException:
> Auth
> >> fail; routing to failure: java.io.IOException: Failed to obtain
> connection
> >> to remote host due to com.jcraft.jsch.JSchException: Auth fail
> >> 2017-03-29 13:59:51,706 ERROR [Timer-Driven Process Thread-1]
> >> o.a.nifi.processors.standard.PutSFTP
> >> java.io.IOException: Failed to obtain connection to remote host due to
> >> com.jcraft.jsch.JSchException: Auth fail
> >> at
> >>
> org.apache.nifi.processors.standard.util.SFTPTransfer.getChannel(SFTPTransfer.java:447)
> >> ~[nifi-standard-processors-1.1.0.jar:1.1.0]
> >> at
> >>
> org.apache.nifi.processors.standard.util.SFTPTransfer.put(SFTPTransfer.java:529)
> >> ~[nifi-standard-processors-1.1.0.jar:1.1.0]
> >> at
> >>
> org.apache.nifi.processors.standard.PutFileTransfer$1.process(PutFileTransfer.java:135)
> >> ~[nifi-standard-processors-1.1.0.jar:1.1.0]
> >> at
> >>
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession

nifi attributes logics

2017-04-06 Thread Pompilio Ramirez
Hello,

I cant find a way to take 2 attributes and create an attribute with only
the difference between them.

Has anyone accomplished that? In general want to define routing to
processing groups that do things. I want to define end points ( say
customer1 / customer2  / customer3 )
I get one flowfile and I define what the customer(s) need to normalize the
data using advance update attribute.

So flowfile condition is met and I build attributes under actions:

So I enrich flowfile with attribute:

So customer1 would have an attribute "ACTIONS" that says ( convert / merge
/ enrich / convert )
Customer2 has "ACTIONS" ( convert / merge / gzip )
Customer3 has "ACTIONS" ( convert / merge  )

I then want to take that flowfile and have logic that says all customers
need convert / merge so I'll take that flowfile and do convert ( this is
very expensive ) ( or do all the common steps as one flowfile ) before they
clone.

Knowing my steps are linear so just want to find a way to build logic to
know my common steps and create attributes with the difference.

thoughts?


Re: GetHDFS and triggering

2017-04-06 Thread Arnaud G
Hi Matt,

Thank you very much for your time and detailed answer.

I will try to explain a little bit more the use case, as I suppose that
this should be close to standard patterns.

My current use case involve the extraction of data from a Web API. This
data is not as clean as we would like to but we have to cope with that. The
data are then stored as CSV files on HDFS and an external Hive table is
pointing to the directory. As the data have multiple duplicate/error that
need to be clean up before ingestion, we have a PrestoDB view on this table
that is providing a clean data set.

The goal now is to select the data from this table and insert them in a
another Hive/Presto table. For this I still need to join the data from this
landing table with the data in the destination table to ensure that I'm not
inserting a duplicate records. I do this with Presto. Once done, I need to
move the file from this landing folder to another folder to empty/truncate
the external table. This where my problem reside as I'm struggling to find
an elegant way to trigger this file move only if the SQL process is
successful.

I tried to have a look at ListHDFS but this processor like GetHDFS cannot
be triggered (except by CRON of course). As I need to receive enough data
in this table to correlate them, I cannot have a flow process based on each
file, which rules out FetchHDFS. The direct convertion to ORC is not
working either as I have to filter/deduplicate a lot of information from
the raw API data.

The last idea we had was to do insert line by line in a landing table that
will act as the current folder but with more flexibility as we won't have
to take care of the CSV files. Unfortunately this doesn't work with using
putHiveQL as we have around 10'000-100'000 lines to insert per minute and
the putHiveQL processor cannot follow (we get 1-2 inserts per sec). We
tried to use the Hivestreaming processor, but despite our effort we were
unable to make it work with Kerberos and HDP 2.5 (The Nifi processor seems
to require a the "hive-metastore" principal that we don't have, and when we
create it, we still encounter Kerberos issue). Our last test was to use the
Presto Teradata JDBC driver with the PutSQL processor but it doesn't work
as this driver is in auto-commit mode that is incompatible with the
processor.

I'm currently trying to imagine a better flow that can go around the
limitation, and will maybe try to use a SQL database as a buffer instead.
This should provide me with a better way to control when to truncate this
table. Of course any thought/recommendation are appreciated.

Thanks!














On Wed, Apr 5, 2017 at 10:16 PM, Matt Burgess  wrote:

> Arnaud,
>
> Can you explain more about what you'd like to do via an INSERT query?
> Are you trying to accomplish #3 using Hive via JDBC?  If so you should
> be able to use PutHiveQL rather than PutSQL. If you already have an
> external table in Hive and don't yet have the ORC table, you should be
> able to use a CREATE TABLE AS (CTAS) statement [1] in PutHiveQL.  If
> the ORC table exists and you want to insert from the external table,
> you can use INSERT INTO/OVERWRITE [2].  Apologies if I misunderstood
> what you are trying to do, if that's the case can you please
> elaborate?
>
> Per your comment that you can't trigger GetHDFS, consider using
> ListHDFS [3] and/or FetchHDFS [4] instead. If you know which files you
> want (from the flow), you don't need ListHDFS, rather you'd just set
> the filename attribute on the flow and route it to FetchHDFS.  Having
> said that, if you are already pulling the content of the HDFS files
> into NiFi, perhaps consider the ConvertAvroToORC [5] processor (if you
> can easily get your incoming data into Avro). This would allow you to
> convert to ORC within NiFi, then you can use PutHDFS to land the files
> on Hadoop, then PutHiveQL to create a table on top of the directory
> containing the ORC files.  If that is overkill, hopefully the
> PutHiveQL with the CTAS or INSERT statements will suffice.
>
> Regards,
> Matt
>
> [1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#
> LanguageManualDDL-CreateTableAsSelect(CTAS)
> [2] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#
> LanguageManualDML-InsertingdataintoHiveTablesfromqueries
> [3] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.hadoop.ListHDFS/index.html
> [4] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.hadoop.FetchHDFS/index.html
> [5] https://nifi.apache.org/docs/nifi-docs/components/org.
> apache.nifi.processors.hive.ConvertAvroToORC/index.html
>
>
> On Wed, Apr 5, 2017 at 8:45 AM, Arnaud G  wrote:
> > Hi,
> >
> > I'm currently building a flow in Nifi and I'm trying to get the best way
> to
> > do it in a reliable manner:
> >
> > The setup is the following:
> >
> > 1) Some files are copied in a folder in HDFS
> > 2) An Hive external table point to this directory
> > 3) The dat

Re: apache nifi 1.1.2 startup really slow on openstack environment

2017-04-06 Thread Andre
Hi,

Glad it helped! Also note the fix has already been merged to 1.2.0

Cheers

On 6 Apr 2017 11:53, "尹文才"  wrote:

> Hi Arnaud and Andre, thanks very much for your help, now the NiFi cluster
> is able to start up very fast after updating the bootstrap.conf to use
> /dev/./urandom as an extra Java parameter. Let's just hope the fix for the
> problem will be included in later NiFi version(1.2.0?).
>
> 2017-04-05 20:22 GMT+08:00 Arnaud G :
>
>> Hi!
>>
>> We had the same problem on Openstack (on clustered and non clustered
>> instances) and the problem can easily be solved by changing the entropy
>> source to /dev/urandom in the Java security parameters.
>>
>> Regards,
>>
>> On Wed, Apr 5, 2017 at 1:07 PM, 尹文才  wrote:
>>
>>> Thanks Andre, you mean the slow startup of NiFi secure cluster on
>>> openstack currently could be remedied by updating bootstrap.conf to
>>> instruct NiFi to use */dev/urandom, I will give it a try tomorrow. By
>>> the way, NiFi starts up very slow in standalone mode as well when deployed
>>> on openstack environment, according to my test, I had deployed a single
>>> fresh NiFi node(non-secure) on openstack and it took around 7 minutes to
>>> get up. Do you have any ideas of this problem?*
>>>
>>> 2017-04-05 17:48 GMT+08:00 Andre :
>>>
 Hi,

 Sorry for the abrupt message it was sent before I finished typing... :-)

 Some users have issues due to lack of entropy within the VMs. This was
 covered on 1.2.0-SNAPSHOT but users of old versions may need to configure
 entropy options manually.

 Let me know if it helps.

 Kind regards

 On Wed, Apr 5, 2017 at 7:45 PM, Andre  wrote:

> Hi,
>
> Can you please try the fix discussed here:
>
> https://issues.apache.org/jira/browse/NIFI-3313
>
> Kind regards
>
> On Wed, Apr 5, 2017 at 7:36 PM, 尹文才  wrote:
>
>>   Hi guys, I have been using apache nifi since version 0.7 and as I
>> know from my previous experiences that nifi should be up and running in
>> less than 1 minute(standalone mode). Recently my company has switched 
>> from
>> running in Vmware ESXI to openstack and thus I reinstalled nifi 1.1.2 in
>> openstack virtual machines(2 nodes cluster). To my surprise, nifi started
>> up very slow and it almost took 6-7 minutes which is much slower than
>> before.
>>
>>   I actually was not very familiar with openstack and it was setup
>> and configured by someone else, but I know that there're 2 network
>> interfaces for the virtual machine, one is an internal IP(10.0.0.*, for
>> communication with machines inside openstack) and the other one is an
>> external IP(192.168.227.*, for communication with office network, I use
>> this IP to access the nifi web UI)
>>
>>   I then decided to check the logs(nifi-app.log) and I found that it
>> took the most time between the following 2 log lines:
>>
>> 2017-04-05 16:59:09,162 INFO [main] 
>> o.a.nifi.properties.NiFiPropertiesLoader
>> Loaded 121 properties from /usr/local/nifi/./conf/nifi.properties
>> 2017-04-05 17:03:48,810 INFO [main] 
>> o.a.n.admin.AuditDataSourceFactoryBean
>> Database not built for repository:
>>
>>   but the logs didn't give me any useful clues why it would be so
>> slow. So i dived into nifi's source code and tried to debug to find out
>> the reason, it appeared to me that the application spent the most
>> time at start method of class JettyServer when it tries to start the
>> jetty server:
>>
>> // start the server
>> server.start();
>>
>>   I checked the time it took to start the NiFi server in the log:
>>
>> 2017-04-05 17:08:55,149 INFO [main] org.apache.nifi.NiFi Controller 
>> initialization took 590449683493 nanoseconds.
>>
>>
>> the time '*590449683493 nanoseconds*' equals approximately to 10 minutes.
>>
>> I'm also not very familiar with Jetty, not sure what's going on here. 
>> Did anyone come across this slow
>>
>> startup issue or does anyone have any idea of this problem? Thanks.
>>
>> btw, I've attached the logs.
>>
>>
>>
>
>

>>>
>>
>


Re: Options for increasing performance?

2017-04-06 Thread James McMahon
Thinking more on ways to conquer this problem, I believe I may attack it
from another perspective. Not a very refined one, somewhat Neanderthal, and
doesn't really get ot the heart of the matter. But I think it just may work.

The behavior we consider slow is at the HandleHttpRequest processor. I have
no compelling reason that one single file in a known related subset of
files must be sent per post request. I'm going to try zipping up N files,
and making my first processor step following the HandleHttpRequest be an
unzip.

Jim

On Thu, Apr 6, 2017 at 6:00 AM, James McMahon  wrote:

> Intriguing. I'm one of those who have employed the "single flowfile"
> approach. I'm certainly willing to test out this refinement.
> So to press your point, this is more efficient than setting the
> processor's "Concurrent tasks" to 10 because it assumes the burden of
> initialization for ExecuteScript once, rather than using the processor
> configuration parm (which presumably assumes that initialization burden ten
> times)?
>
> I currently set "Concurrent tasks" to 50.  The logjam I am seeing is not
> in my ExecuteScript processor. My delay is definitely a non-steady,
> "non-fast" stream of data at my HandleHttpRequest processor, the first
> processor in my workflow. Why that is the case is a mystery we've yet to
> resolve.
>
> One thing I'd welcome is some idea of what is a reasonable expectation for
> requests handled by HandleHttpRequest in an hour? Maybe 1500 in an hour is
> low, high, or perhaps it is entirely reasonable. We really have little
> insight. Any empirical data from user practical experience would be most
> welcome.
>
> Also, I added a second HandleHttpRequest fielding requests on a second
> port. I did not see any level of improved throughput. Why might that be? My
> expectation was that with two doors open rather than one, I'd see some more
> influx of data.
>
> Thank you.
> - Jim
>
> On Wed, Apr 5, 2017 at 4:26 PM, Scott Wagner 
> wrote:
>
>> One of my experiences is that when using ExecuteScript and Python is that
>> having an ExecuteScript that works on an individual FlowFile when you have
>> multiple in the input queue is very inefficient, even when you set it to a
>> timer of 0 sec.
>>
>> Instead, I have the following in all of my Python scripts:
>>
>> flowFiles = session.get(10)
>> for flowFile in flowFiles:
>> if flowFile is None:
>> continue
>> # Do stuff here
>>
>> That seems to improve the throughput of the ExecuteScript processor
>> dramatically.
>>
>> YMMV
>>
>> - Scott
>>
>> James McMahon 
>> Wednesday, April 5, 2017 12:48 PM
>> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
>> 0.7.x workflow HandleHttpRequest processor. That processor hands the
>> flowfile off to an ExecuteScript processor that runs a python script. This
>> script is very, very simple: it takes an incoming JSO object and loads it
>> into a Python dictionary, and verifies the presence of required fields
>> using simple has_key checks on the dictionary. There are only eight fields
>> in the incoming JSON object.
>>
>> The throughput for these two processes is not exceeding 100-150 files in
>> five minutes. It seems very slow in light of the minimal processing going
>> on in these two steps.
>>
>> I notice that there are configuration operations seemingly related to
>> optimizing performance. "Concurrent tasks", for example,  is only set by
>> default to 1 for each processor.
>>
>> What performance optimizations at the processor level do users recommend?
>> Is it advisable to crank up the concurrent tasks for a processor, and is
>> there an optimal performance point beyond which you should not crank up
>> that value? Are there trade-offs?
>>
>> I am particularly interested in optimizations for HandleHttpRequest and
>> ExecuteScript processors.
>>
>> Thanks in advance for your thoughts.
>>
>> cheers,
>>
>> Jim
>>
>>
>>
>


Re: Options for increasing performance?

2017-04-06 Thread James McMahon
Intriguing. I'm one of those who have employed the "single flowfile"
approach. I'm certainly willing to test out this refinement.
So to press your point, this is more efficient than setting the processor's
"Concurrent tasks" to 10 because it assumes the burden of initialization
for ExecuteScript once, rather than using the processor configuration
parm (which presumably assumes that initialization burden ten times)?

I currently set "Concurrent tasks" to 50.  The logjam I am seeing is not in
my ExecuteScript processor. My delay is definitely a non-steady, "non-fast"
stream of data at my HandleHttpRequest processor, the first processor in my
workflow. Why that is the case is a mystery we've yet to resolve.

One thing I'd welcome is some idea of what is a reasonable expectation for
requests handled by HandleHttpRequest in an hour? Maybe 1500 in an hour is
low, high, or perhaps it is entirely reasonable. We really have little
insight. Any empirical data from user practical experience would be most
welcome.

Also, I added a second HandleHttpRequest fielding requests on a second
port. I did not see any level of improved throughput. Why might that be? My
expectation was that with two doors open rather than one, I'd see some more
influx of data.

Thank you.
- Jim

On Wed, Apr 5, 2017 at 4:26 PM, Scott Wagner 
wrote:

> One of my experiences is that when using ExecuteScript and Python is that
> having an ExecuteScript that works on an individual FlowFile when you have
> multiple in the input queue is very inefficient, even when you set it to a
> timer of 0 sec.
>
> Instead, I have the following in all of my Python scripts:
>
> flowFiles = session.get(10)
> for flowFile in flowFiles:
> if flowFile is None:
> continue
> # Do stuff here
>
> That seems to improve the throughput of the ExecuteScript processor
> dramatically.
>
> YMMV
>
> - Scott
>
> James McMahon 
> Wednesday, April 5, 2017 12:48 PM
> I am receiving POSTs from a Pentaho process, delivering files to my NiFi
> 0.7.x workflow HandleHttpRequest processor. That processor hands the
> flowfile off to an ExecuteScript processor that runs a python script. This
> script is very, very simple: it takes an incoming JSO object and loads it
> into a Python dictionary, and verifies the presence of required fields
> using simple has_key checks on the dictionary. There are only eight fields
> in the incoming JSON object.
>
> The throughput for these two processes is not exceeding 100-150 files in
> five minutes. It seems very slow in light of the minimal processing going
> on in these two steps.
>
> I notice that there are configuration operations seemingly related to
> optimizing performance. "Concurrent tasks", for example,  is only set by
> default to 1 for each processor.
>
> What performance optimizations at the processor level do users recommend?
> Is it advisable to crank up the concurrent tasks for a processor, and is
> there an optimal performance point beyond which you should not crank up
> that value? Are there trade-offs?
>
> I am particularly interested in optimizations for HandleHttpRequest and
> ExecuteScript processors.
>
> Thanks in advance for your thoughts.
>
> cheers,
>
> Jim
>
>
>


Re: Options for increasing performance?

2017-04-06 Thread James McMahon
Thanks very much Juan. I do not find Wireshark in the apps available to me,
but will ask our infrastructure folks about that this morning. -Jim

On Wed, Apr 5, 2017 at 4:01 PM, Juan Sequeiros  wrote:

> If you have wireshark you could use:
>
>  tshark -f "port 8446 or port 9448"
>
>
> On Wed, Apr 5, 2017 at 3:45 PM James McMahon  wrote:
>
>> Thank you Bryan. I will explore these things. I suspect we are not
>> receiving from the source optimally. Reason I say that is this: I am doing
>> manual refreshes on my flow page every 3 to 4 seconds. Frequently I go
>> through 3 or 4 refreshes, and not figures change in my queues nor in my
>> processors. Seems like my workflow is just sitting there waiting for new
>> arrivals.
>> I am using ports 8446 and 9448 (I have two HandleHttpRequest processors
>> now). Anyone know fo a few commands I can use to monitor arrivals at my
>> port of incoming POSTs? Is this something I can monitor using the FF
>> developer features? -Jim
>>
>> On Wed, Apr 5, 2017 at 3:39 PM, Bryan Rosander 
>> wrote:
>>
>> This seems to have gotten lost in the chain, resending (please disregard
>> if you've already read/tried it):
>>
>> Another thing to consider is whether the bottleneck is in NiFi or before
>> it gets there.  Is the source of data capable of making post requests more
>> quickly than that as configured? Is network latency or throughput a
>> limitation?  You might try  posting to another http server to see whether
>> the problem is within NiFi.
>>
>> E.g. modify something like https://gist.github.com/bradmontgomery/2219997 to
>> log requests and see if the rate is similar even when no other processing
>> is done on the server side.
>>
>> If you go with the python server, you may want to use the threading mixin
>> as well.
>>
>> http://stackoverflow.com/questions/14088294/multithreaded-web-server-in-
>> python
>>
>> Thanks,
>> Bryan
>>
>> On Wed, Apr 5, 2017 at 3:19 PM, James McMahon 
>> wrote:
>>
>> We are not seeing 503s. We have tried setting up a second
>> HandleHttpRequest, watching a different port, and "round robin`ing" to the
>> two ports. We made a relatively low gain from abut 5 minutes for 100 files
>> consistently to 4:40 for 100. I watch my workflow, and at no point does a
>> large number of flowfiles queue up in any queue leading into or coming out
>> of any processor.
>>
>> On Wed, Apr 5, 2017 at 2:44 PM, Bryan Rosander 
>> wrote:
>>
>> It looks like HandleHttpRequest should be sending back a 503 if its
>> containerQueue fills up (default capacity of 50 requests that have been
>> accepted but not processed in an onTrigger()) [1].  Also, the default
>> thread pool the jetty server is using should be able to create up to 200
>> threads to accept connections and the handler is using an async context so
>> the in-flight flow files shouldn't be holding up new requests.
>>
>> If you're not seeing 503s it might be on the sender side of the
>> equation.  Is the sender doing posts concurrently or waiting on each to
>> complete before sending another?
>>
>> [1] https://github.com/apache/nifi/blob/rel/nifi-0.7.0/nifi-
>> nar-bundles/nifi-standard-bundle/nifi-standard-
>> processors/src/main/java/org/apache/nifi/processors/
>> standard/HandleHttpRequest.java#L395
>>
>> On Wed, Apr 5, 2017 at 2:27 PM, Joe Witt  wrote:
>>
>> Much of this goodness can be found in the help->Users Guide.
>> Adjusting run durection/scheduling factors:
>>   https://nifi.apache.org/docs/nifi-docs/html/user-guide.
>> html#scheduling-tab
>>
>> These are the latest docs but I'm sure there is coverage in the older
>> stuff.
>>
>> Thanks
>>
>> On Wed, Apr 5, 2017 at 2:23 PM, James McMahon 
>> wrote:
>> > Yes sir! Sure am. And I know, because I have committed that very silly
>> > mistake before. We are indeed seeing # responses = # requests  -Jim
>> >
>> > On Wed, Apr 5, 2017 at 2:13 PM, Bryan Rosander 
>> wrote:
>> >>
>> >> Hey James,
>> >>
>> >> Are you making sure that every route from HandleHttpRequest goes to a
>> >> HandleHttpResponse?  If not, the StandardHttpContextMap may be filling
>> up
>> >> with requests which would probably delay processing.
>> >>
>> >> Thanks,
>> >> Bryan
>> >>
>> >> On Wed, Apr 5, 2017 at 2:07 PM, James McMahon 
>> >> wrote:
>> >>>
>> >>> Thank you very much Matt. I have cranked my Concurrent Tasks config
>> parm
>> >>> on my ExecuteScripts up to 20, and judging by the empty queue feeding
>> that
>> >>> processor it is screaming through the flowfiles arriving at its
>> doorstep.
>> >>>
>> >>> Can anyone comment on performance optimizations for
>> HandleHttpRequest? In
>> >>> your experiences, is HandleHttpRequest a bottleneck? I do notice that
>> I
>> >>> often have a count in the processor for "flowfile in process" within
>> the
>> >>> processor. Anywhere from 1 to 10 when it does show such a count.
>> >>>
>> >>> -Jim
>> >>>
>> >>> On Wed, Apr 5, 2017 at 1:52 PM, Matt Burgess 
>> >>> wrote:
>> 
>>  Jim,
>> 
>>  One quick thing you can try is to use GenerateFlowF

How can datetime to month conversion failed in french language?

2017-04-06 Thread prabhu Mahendran
In NiFi How JVM Check language of machine?

is that take any default language like English(US) else System DateTime
Selected language?

I face issue while converting datetime format into Month using expression
language with NiFi package installed with French OS.

But it worked in English(US) Selected language.

Can anyone help me to resolve this?