Re: How backup works when flow.xml size more than max storage?

2017-01-19 Thread Koji Kawamura
Hi Prabhu,

Thanks for the confirmation. I can't guarantee if it's included in the
next release, but try my best :) You can watch the JIRA to get updates
when it proceeds.
https://issues.apache.org/jira/browse/NIFI-3373

Thanks,
Koji

On Fri, Jan 20, 2017 at 2:16 PM, prabhu Mahendran
 wrote:
> Hi Koji,
>
> Both simulation looks perfect. I was expected this exact behavior and it
> matches my requirement, also it sounds logical. Shall I expect this changes
> in next nifi release version??
>
>
> Thank you so much for this tremendous support.
>
>
> On Fri, Jan 20, 2017 at 6:14 AM, Koji Kawamura 
> wrote:
>>
>> Hi Prabhu,
>>
>> In that case, yes, as your assumption, even the latest archive exceeds
>> 500MB, the latest archive is saved, as long as it was written to disk
>> successfully.
>>
>> After that, when user updates NiFi flow, before new archive is
>> created, the previous one will be removed, because max.storage
>> exceeds. Then the latest will be archived.
>>
>> Let's simulate the scenario with the to-be-updated logic by NIFI-3373,
>> in which the size of flow.xml keeps increasing:
>>
>> # CASE-1
>>
>> archive.max.storage=10MB
>> archive.max.count = 5
>>
>> Time | flow.xml | archives | archive total |
>> t1 | f1 5MB  | f1 | 5MB
>> t2 | f2 5MB  | f1, f2 | 10MB
>> t3 | f3 5MB  | f1, f2, f3 | 15MB
>> t4 | f4 10MB | f2, f3, f4 | 20MB
>> t5 | f5 15MB | f4, f5 | 25MB
>> t6 | f6 20MB | f6 | 20MB
>> t7 | f7 25MB | t7 | 25MB
>>
>> * t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <=
>> 10MB. WAR message starts to be logged from this point, because total
>> archive size > 10MB.
>> * t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB.
>> * t5: Even if flow.xml size exceeds max.storage, the latest archive is
>> created. f4 are kept because f4 <= 10MB.
>> * t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB.
>>
>> In this case, NiFi will keep logging WAR (or should be ERR??) message
>> indicating archive storage size is exceeding limit, from t3.
>> After t6, even if archive.max.count = 5, NiFi will only keep the
>> latest flow.xml.
>>
>> # CASE-2
>>
>> If you'd like to keep at least 5 archives no matter what, then set
>> blank max.storage and max.time.
>>
>> archive.max.storage=
>> archive.max.time=
>> archive.max.count = 5 // Only limit archives by count
>>
>> Time | flow.xml | archives | archive total |
>> t1 | f1 5MB  | f1 | 5MB
>> t2 | f2 5MB  | f1, f2 | 10MB
>> t3 | f3 5MB  | f1, f2, f3 | 15MB
>> t4 | f4 10MB | f1, f2, f3, f4 | 25MB
>> t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB
>> t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB
>> t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB)
>> t8 | f8 30MB | f3, f4, f5, f6 | 50MB
>>
>> * From t6, oldest archive is removed to keep number of archives <= 5
>> * At t7, if the disk has only 60MB space, f7 won't be archived. And
>> after this point, archive mechanism stop working (Trying to create new
>> archive, but keep getting exception: no space left on device).
>>
>> In either case above, once flow.xml has grown to that size, some human
>> intervention would be needed.
>> Do those simulation look reasonable?
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran
>>  wrote:
>> > Hi Koji,
>> >
>> > Thanks for your information.
>> >
>> > Actually the task description looks fine. I have one question here,
>> > consider
>> > the storage limit is 500MB, suppose my latest workflow exceeds this
>> > limit,
>> > which behavior is performed with respect to the properties(max.count,
>> > max.time and max.storage)?? In my assumption latest archive is saved
>> > even it
>> > exceeds 500MB, so what happen from here? Either it will keep on save the
>> > single latest archive with the large size or it will notify the user to
>> > increase the size and preserves the latest file till we restart the
>> > flow??
>> > If so what happens if the size is keep on increasing with respect to
>> > 500MB,
>> > it will save archive based on count or only latest archive throughtout
>> > nifi
>> > is in running status??
>> >
>> > Many thanks
>> >
>> > On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura 
>> > wrote:
>> >>
>> >> Hi Prabhu,
>> >>
>> >> Thank you for the suggestion.
>> >>
>> >> Keeping latest N archives is nice, it's simple :)
>> >>
>> >> The max.time and max.storage have other benefit and since already
>> >> released, we should keep existing behavior with these settings, too.
>> >> I've created a JIRA to add archive.max.count property.
>> >> https://issues.apache.org/jira/browse/NIFI-3373
>> >>
>> >> Thanks,
>> >> Koji
>> >>
>> >> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
>> >>  wrote:
>> >> > Hi Koji,
>> >> >
>> >> >
>> >> > Thanks for your reply,
>> >> >
>> >> > Yes. Solution B may meet as I required. Currently if the storage size
>> >> > meets,
>> >> > complete folder is getting deleted and the new flow is 

Re: How backup works when flow.xml size more than max storage?

2017-01-19 Thread prabhu Mahendran
Hi Koji,

Both simulation looks perfect. I was expected this exact behavior and it
matches my requirement, also it sounds logical. Shall I expect this changes
in next nifi release version??


Thank you so much for this tremendous support.


On Fri, Jan 20, 2017 at 6:14 AM, Koji Kawamura 
wrote:

> Hi Prabhu,
>
> In that case, yes, as your assumption, even the latest archive exceeds
> 500MB, the latest archive is saved, as long as it was written to disk
> successfully.
>
> After that, when user updates NiFi flow, before new archive is
> created, the previous one will be removed, because max.storage
> exceeds. Then the latest will be archived.
>
> Let's simulate the scenario with the to-be-updated logic by NIFI-3373,
> in which the size of flow.xml keeps increasing:
>
> # CASE-1
>
> archive.max.storage=10MB
> archive.max.count = 5
>
> Time | flow.xml | archives | archive total |
> t1 | f1 5MB  | f1 | 5MB
> t2 | f2 5MB  | f1, f2 | 10MB
> t3 | f3 5MB  | f1, f2, f3 | 15MB
> t4 | f4 10MB | f2, f3, f4 | 20MB
> t5 | f5 15MB | f4, f5 | 25MB
> t6 | f6 20MB | f6 | 20MB
> t7 | f7 25MB | t7 | 25MB
>
> * t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <=
> 10MB. WAR message starts to be logged from this point, because total
> archive size > 10MB.
> * t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB.
> * t5: Even if flow.xml size exceeds max.storage, the latest archive is
> created. f4 are kept because f4 <= 10MB.
> * t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB.
>
> In this case, NiFi will keep logging WAR (or should be ERR??) message
> indicating archive storage size is exceeding limit, from t3.
> After t6, even if archive.max.count = 5, NiFi will only keep the
> latest flow.xml.
>
> # CASE-2
>
> If you'd like to keep at least 5 archives no matter what, then set
> blank max.storage and max.time.
>
> archive.max.storage=
> archive.max.time=
> archive.max.count = 5 // Only limit archives by count
>
> Time | flow.xml | archives | archive total |
> t1 | f1 5MB  | f1 | 5MB
> t2 | f2 5MB  | f1, f2 | 10MB
> t3 | f3 5MB  | f1, f2, f3 | 15MB
> t4 | f4 10MB | f1, f2, f3, f4 | 25MB
> t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB
> t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB
> t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB)
> t8 | f8 30MB | f3, f4, f5, f6 | 50MB
>
> * From t6, oldest archive is removed to keep number of archives <= 5
> * At t7, if the disk has only 60MB space, f7 won't be archived. And
> after this point, archive mechanism stop working (Trying to create new
> archive, but keep getting exception: no space left on device).
>
> In either case above, once flow.xml has grown to that size, some human
> intervention would be needed.
> Do those simulation look reasonable?
>
> Thanks,
> Koji
>
> On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran
>  wrote:
> > Hi Koji,
> >
> > Thanks for your information.
> >
> > Actually the task description looks fine. I have one question here,
> consider
> > the storage limit is 500MB, suppose my latest workflow exceeds this
> limit,
> > which behavior is performed with respect to the properties(max.count,
> > max.time and max.storage)?? In my assumption latest archive is saved
> even it
> > exceeds 500MB, so what happen from here? Either it will keep on save the
> > single latest archive with the large size or it will notify the user to
> > increase the size and preserves the latest file till we restart the
> flow??
> > If so what happens if the size is keep on increasing with respect to
> 500MB,
> > it will save archive based on count or only latest archive throughtout
> nifi
> > is in running status??
> >
> > Many thanks
> >
> > On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura 
> > wrote:
> >>
> >> Hi Prabhu,
> >>
> >> Thank you for the suggestion.
> >>
> >> Keeping latest N archives is nice, it's simple :)
> >>
> >> The max.time and max.storage have other benefit and since already
> >> released, we should keep existing behavior with these settings, too.
> >> I've created a JIRA to add archive.max.count property.
> >> https://issues.apache.org/jira/browse/NIFI-3373
> >>
> >> Thanks,
> >> Koji
> >>
> >> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
> >>  wrote:
> >> > Hi Koji,
> >> >
> >> >
> >> > Thanks for your reply,
> >> >
> >> > Yes. Solution B may meet as I required. Currently if the storage size
> >> > meets,
> >> > complete folder is getting deleted and the new flow is not tracked in
> >> > the
> >> > archive folder. This behavior is the drawback here. I need atleast
> last
> >> > workflow to be saved in the archive folder and notify the user to
> >> > increase
> >> > the size. At the same time till nifi restarts, atleast last complete
> >> > workflow should be backed up.
> >> >
> >> >
> >> > My another suggestion is as follows:
> >> >
> >> >
> >> > Regardless of the max.time and max.storage property, Can we have only
> >> > few
> >> > 

Re: How backup works when flow.xml size more than max storage?

2017-01-19 Thread Koji Kawamura
Hi Prabhu,

In that case, yes, as your assumption, even the latest archive exceeds
500MB, the latest archive is saved, as long as it was written to disk
successfully.

After that, when user updates NiFi flow, before new archive is
created, the previous one will be removed, because max.storage
exceeds. Then the latest will be archived.

Let's simulate the scenario with the to-be-updated logic by NIFI-3373,
in which the size of flow.xml keeps increasing:

# CASE-1

archive.max.storage=10MB
archive.max.count = 5

Time | flow.xml | archives | archive total |
t1 | f1 5MB  | f1 | 5MB
t2 | f2 5MB  | f1, f2 | 10MB
t3 | f3 5MB  | f1, f2, f3 | 15MB
t4 | f4 10MB | f2, f3, f4 | 20MB
t5 | f5 15MB | f4, f5 | 25MB
t6 | f6 20MB | f6 | 20MB
t7 | f7 25MB | t7 | 25MB

* t3: f3 can is archived even total exceeds 10MB. Because f1 + f2 <=
10MB. WAR message starts to be logged from this point, because total
archive size > 10MB.
* t4: The oldest f1 is removed, because f1 + f2 + f3 > 10MB.
* t5: Even if flow.xml size exceeds max.storage, the latest archive is
created. f4 are kept because f4 <= 10MB.
* t6: f4 and f5 are removed because f4 + f5 > 10MB, and also f5 > 10MB.

In this case, NiFi will keep logging WAR (or should be ERR??) message
indicating archive storage size is exceeding limit, from t3.
After t6, even if archive.max.count = 5, NiFi will only keep the
latest flow.xml.

# CASE-2

If you'd like to keep at least 5 archives no matter what, then set
blank max.storage and max.time.

archive.max.storage=
archive.max.time=
archive.max.count = 5 // Only limit archives by count

Time | flow.xml | archives | archive total |
t1 | f1 5MB  | f1 | 5MB
t2 | f2 5MB  | f1, f2 | 10MB
t3 | f3 5MB  | f1, f2, f3 | 15MB
t4 | f4 10MB | f1, f2, f3, f4 | 25MB
t5 | f5 15MB | f1, f2, f3, f4, f5 | 40MB
t6 | f6 20MB | f2, f3, f4, f5, f6 | 55MB
t7 | f7 25MB | f3, f4, f5, f6, (f7) | 50MB, (75MB)
t8 | f8 30MB | f3, f4, f5, f6 | 50MB

* From t6, oldest archive is removed to keep number of archives <= 5
* At t7, if the disk has only 60MB space, f7 won't be archived. And
after this point, archive mechanism stop working (Trying to create new
archive, but keep getting exception: no space left on device).

In either case above, once flow.xml has grown to that size, some human
intervention would be needed.
Do those simulation look reasonable?

Thanks,
Koji

On Thu, Jan 19, 2017 at 5:48 PM, prabhu Mahendran
 wrote:
> Hi Koji,
>
> Thanks for your information.
>
> Actually the task description looks fine. I have one question here, consider
> the storage limit is 500MB, suppose my latest workflow exceeds this limit,
> which behavior is performed with respect to the properties(max.count,
> max.time and max.storage)?? In my assumption latest archive is saved even it
> exceeds 500MB, so what happen from here? Either it will keep on save the
> single latest archive with the large size or it will notify the user to
> increase the size and preserves the latest file till we restart the flow??
> If so what happens if the size is keep on increasing with respect to 500MB,
> it will save archive based on count or only latest archive throughtout nifi
> is in running status??
>
> Many thanks
>
> On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura 
> wrote:
>>
>> Hi Prabhu,
>>
>> Thank you for the suggestion.
>>
>> Keeping latest N archives is nice, it's simple :)
>>
>> The max.time and max.storage have other benefit and since already
>> released, we should keep existing behavior with these settings, too.
>> I've created a JIRA to add archive.max.count property.
>> https://issues.apache.org/jira/browse/NIFI-3373
>>
>> Thanks,
>> Koji
>>
>> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
>>  wrote:
>> > Hi Koji,
>> >
>> >
>> > Thanks for your reply,
>> >
>> > Yes. Solution B may meet as I required. Currently if the storage size
>> > meets,
>> > complete folder is getting deleted and the new flow is not tracked in
>> > the
>> > archive folder. This behavior is the drawback here. I need atleast last
>> > workflow to be saved in the archive folder and notify the user to
>> > increase
>> > the size. At the same time till nifi restarts, atleast last complete
>> > workflow should be backed up.
>> >
>> >
>> > My another suggestion is as follows:
>> >
>> >
>> > Regardless of the max.time and max.storage property, Can we have only
>> > few
>> > files in archive(consider only 10 files). Each action from the nifi
>> > canvas
>> > should be tracked here, if the flow.xml.gz archive files count reaches
>> > it
>> > should delete the old first file and save the latest file, so that the
>> > count
>> > 10 is maintained. Here we can maintain the workflow properly and backup
>> > is
>> > also achieved without confusing with max.time and max.storage. Only case
>> > is
>> > that the disk size exceeds, we should notify user about this.
>> >
>> >
>> > Many thanks.
>> >
>> >
>> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura 

Re: ListFile, FetchFile Scalability

2017-01-19 Thread Joe Skora
Mark,

This thread shook something loose in my brain from when the state changes
were made.  Testing it out, I could easily create a case where the 2
timestamp approach was insufficient to avoid missing files.  The hard part
was making a unit test for it, which I eventually succeeded at.

I filed a Jira for it, NIFI-3332
, with the unit test but
the basic scenario is that if the processor runs while the system is
writing a batch of files with the same timestamp the processor will pickup
what has already been written but then ignore the remainder of the batch on
the next iteration.  It is an edge case, but I can definitely see it happen
on a system under load if data is transferred in from other places and then
rolled into NiFi.

Can you take a look at that ticket and let me know what you think?

Thanks,
Joe

On Tue, Jan 10, 2017 at 10:09 PM, James McMahon 
wrote:

> These have been invaluable insights Mark. Thank you very much for your
> help. -Jim
>
> On Tue, Jan 10, 2017 at 2:13 PM, Mark Payne  wrote:
>
>> Jim,
>>
>> Off the top of my head, I don't remember the reason for two dates,
>> specifically. I think it may have had to do
>> with ensuring that if we run at time X, we could potentially pick up a
>> file that also has a timestamp of X. Then,
>> we could potentially have 1+ files come in at time X also, after the
>> processor finished running. If we only looked
>> at the one timestamp, we could miss those 1+ files that came in later,
>> but during the same second or millisecond
>> or whatever precision your operating system provides for file
>> modification precision. Someone else on the list
>> may have more insight into the exact meaning of the two timestamps, as I
>> didn't come up with the algorithm.
>>
>> Yes, the ListFile processor will scan through the directory each time
>> that it runs to find any new files. Would recommend
>> that you not schedule ListFile to run with the default "0 sec" run
>> schedule but instead set it to something like "1 min" or
>> however often you can afford/need to. I believe that if it is scheduled
>> to run too frequently, it will actually yield itself,
>> which would cause it to 'pause' for 1 second (by default; this is
>> configured in the Settings for the Processor as well).
>>
>> The files that you mention there are simply the internals of the
>> Write-Ahead Log. When the WAL is updated,
>> it picks partition to write the update to (the partition directories) and
>> appends to whichever journal file it is
>> currently writing to. If we did this forever, those files would grow
>> indefinitely and aside from running out of disk
>> space, restarting NiFi would take ages. So periodically (by default,
>> every 2 minutes), the WAL is checkpointed.
>>
>> When this happens it creates the 'snapshot' file and writes to the file
>> the current state of the system and then
>> starts a new journal file for each partition. So there's a 'snapshot'
>> file that is a snapshot of the system state
>> and then the journal files that indicate a series of changes from the
>> snapshot to get back to the most recent
>> state.
>>
>> You may occasionally see some other files, such as multiple journal
>> files, snapshot.part files, etc. that are temporary
>> artifacts generated in order to provide better performance and ensure
>> reliability across system crashes/restarts.
>>
>> The wali.lock is simply there to ensure that we don't start NiFi twice
>> and have 2 different processes trying to write to
>> those files at the same time.
>>
>> Hope this helps!
>>
>> Thanks
>> -Mark
>>
>>
>> On Jan 10, 2017, at 10:01 AM, James McMahon  wrote:
>>
>> Thank you very much Mark. This is very helpful. Can I ask you just a few
>> quick follow-up questions in an effort to better understand?
>>
>> How does NiFi use those two dates? It seems that the timestamp of last
>> listing would be sufficient to permit NiFi to identify newly received
>> content. Why is it necessary to maintain the timestamp of the most recent
>> file it has sent out?
>>
>> How does NiFi quickly determine which files throughout the nested
>> directory structure were received after the last date it logged? Is it
>> scanning through listings of all the associated directories flagging for
>> processing those files with later dates?
>>
>> I looked more closely at my ./state/local directory and subdirectories.
>> Can you offer a few words about the purpose of each of the following?
>> * file snapshot
>> * file wali.lock
>> * the partition[0-15] subdirectories, each of which appears to own a
>> journal file
>> * the journal file
>> Where are the dates you referenced?
>>
>> Thank you again for your insights.
>>
>> On Tue, Jan 10, 2017 at 8:51 AM, Mark Payne  wrote:
>>
>>> Hi Jim,
>>>
>>> ListFile does not maintain a list of files w/ datetime stamps. Instead,
>>> it store just two timestamps:
>>> the 

RE: How to print flow

2017-01-19 Thread Lee Laim (leelaim)
Alessio,

When you are at the zoom level you want to capture, grab your camera, 
right-click the NiFi banner to bring up a standard print dialog. Ctrl+P* works 
too.

If the Processor Names are too small to resolve, you can either zoom-in, or  
add labels behind them with a large font size.  This labeling can be done 
manually,  programmatically with the NiFi API[1], or even with a script on the 
saved template.xml file.

[1] https://nifi.apache.org/docs/nifi-docs/rest-api/

Cheers,
Lee

*NiFi 0.7.1/Chrome

From: Oleg Zhurakousky [mailto:ozhurakou...@hortonworks.com]
Sent: Thursday, January 19, 2017 11:02 AM
To: users@nifi.apache.org
Subject: Re: How to print flow

Alessio

Outside of screen shot I am not sure you have many options, at least at the 
moment.
Printing something like a flow is more complicated that it may seem at first, 
due to formatting issues. Landscape or Portrait, the paper size, etc., and what 
if the flow doesn’t fit? Should it get auto-resized or spread across multiple 
pages when you are trying to print a large flow.
On top of that each flow in NiFi may and often uses components that are not 
visible unless specifically accessed. For example, flow may contain local 
and/or remote process groups, ControllerServices etc which aren’t visible when 
looking at the flow (i.e., ControllerService). The same goes for process group 
which si just a box, yet when you click on it it opens up another flow etc.

Anyway, I know this is not much help, but as you can see it needs more thoughts 
to be put to it ;)

Cheers
Oleg

On Jan 19, 2017, at 3:27 AM, Alessio Palma 
> wrote:

Hello all,
does anybody found a way to print workflow?

I.E: tool to convert the flow into another format which is readable by other 
software with printing support.



Re: Storage buffer separation in a multi-tenant cluster.

2017-01-19 Thread Kristopher Kane
I'm thinking that NiFi uses in the wild are project oriented with regard to
resources and not presented as an enterprise platform where the
multi-tenant risks are of concern.

On Thu, Jan 12, 2017 at 3:28 PM, Kristopher Kane 
wrote:

> I work with a medium sized Storm cluster that is used by many tenants.  As
> the admins of the Storm cluster we must be mindful of network and CPU IO
> and adjust, manually, based on usage.  Many of these Storm uses would be a
> better fit with NiFi's inbuilt capabilities and ease of use whilst leaving
> the high throughput work in Storm.  Storm works really well out of the box
> with many (dozens) of separate users across hundreds of topologies. We
> simply add more nodes and don't have to worry much about load and users
> walking over each other since our failure replay is from Kafka always.
>
> What isn't obvious to me is how local buffer storage is handled in a
> multi-tenant NiFi cluster and am wondering if others have patterns out
> there to prevent a NiFi user from eating up available disk thus downing
> other user's workflows.
>
> My initial thought is a management layer outside of NiFi that invokes
> Linux FS quotas by user account.  Does NiFi have anything built in for this
> type of preventive measure?
>
> Thanks,
>
> Kris
>


Re: How to print flow

2017-01-19 Thread Oleg Zhurakousky
Alessio

Outside of screen shot I am not sure you have many options, at least at the 
moment.
Printing something like a flow is more complicated that it may seem at first, 
due to formatting issues. Landscape or Portrait, the paper size, etc., and what 
if the flow doesn’t fit? Should it get auto-resized or spread across multiple 
pages when you are trying to print a large flow.
On top of that each flow in NiFi may and often uses components that are not 
visible unless specifically accessed. For example, flow may contain local 
and/or remote process groups, ControllerServices etc which aren’t visible when 
looking at the flow (i.e., ControllerService). The same goes for process group 
which si just a box, yet when you click on it it opens up another flow etc.

Anyway, I know this is not much help, but as you can see it needs more thoughts 
to be put to it ;)

Cheers
Oleg

On Jan 19, 2017, at 3:27 AM, Alessio Palma 
> wrote:

Hello all,
does anybody found a way to print workflow?

I.E: tool to convert the flow into another format which is readable by other 
software with printing support.



Re: GetMongo Processor Alternative

2017-01-19 Thread Bryan Bende
Hi Pablo,

Usually the convention is that a "Get" processor is a source processor,
meaning it doesn't take any input. These are usually scheduled to run on
some interval using the timer or cron scheduling.

When the processor is triggered by an incoming flow file we usually call it
a "Fetch" processor. Currently there isn't a FetchMongoDB, unless someone
has one outside the NiFi code-base, but feel free to create a JIRA for that.

Thanks,

Bryan

On Thu, Jan 19, 2017 at 4:35 AM, Pablo Lopez 
wrote:

> Hi,
>
> I've just managed to get the PutMongo processor work successfully.
>
> However, I just realized that you can't use the GetMongo processor to
> retrieve data based on the input from another flowfile or attribute. It has
> no input. That leaves the Mongo database that you I sent data to a bit
> orphan if you can't retrieve data based on another source.
>
> Does anybody have an alternative on how to get a specific record from
> MongoDB based on an input?
>
> Thanks,
> Pablo.
>


Re: keytab file does not exists but actually it does

2017-01-19 Thread Alessio Palma
I dropped and create again the same processor and for some unknown reason it 
worked.  Very strange.


From: Alessio Palma 
Sent: Thursday, January 19, 2017 12:33:54 PM
To: users@nifi.apache.org
Subject: Re: keytab file does not exists but actually it does


Hi Pierre,

yes my kerberos configuration on the host is fine, I can use kinit to get a 
ticket and ktutil to create keytabs with no issue.


From: Pierre Villard 
Sent: Thursday, January 19, 2017 12:25:24 PM
To: users@nifi.apache.org
Subject: Re: keytab file does not exists but actually it does

Hello Alessio,

Is you krb5.conf correct and correctly referenced in your nifi.properties file?

Pierre

2017-01-19 12:20 GMT+01:00 Alessio Palma 
>:

Hello all,

I'm getting a strange error from a controller service ( hive connection pool ), 
this error says that:


Kerberos Keytab validated against 'myfile...' is invalida because File 
'myfile...' does not exists.


But the keytab file exists and has the correct permission.

What is going wrong? How can I debug this issue ?





Re: keytab file does not exists but actually it does

2017-01-19 Thread Alessio Palma
Hi Pierre,

yes my kerberos configuration on the host is fine, I can use kinit to get a 
ticket and ktutil to create keytabs with no issue.


From: Pierre Villard 
Sent: Thursday, January 19, 2017 12:25:24 PM
To: users@nifi.apache.org
Subject: Re: keytab file does not exists but actually it does

Hello Alessio,

Is you krb5.conf correct and correctly referenced in your nifi.properties file?

Pierre

2017-01-19 12:20 GMT+01:00 Alessio Palma 
>:

Hello all,

I'm getting a strange error from a controller service ( hive connection pool ), 
this error says that:


Kerberos Keytab validated against 'myfile...' is invalida because File 
'myfile...' does not exists.


But the keytab file exists and has the correct permission.

What is going wrong? How can I debug this issue ?





Re: How backup works when flow.xml size more than max storage?

2017-01-19 Thread prabhu Mahendran
Hi Koji,

Thanks for your information.

Actually the task description looks fine. I have one question here,
consider the storage limit is 500MB, suppose my latest workflow exceeds
this limit, which behavior is performed with respect to the
properties(max.count, max.time and max.storage)?? In my assumption latest
archive is saved even it exceeds 500MB, so what happen from here? Either it
will keep on save the single latest archive with the large size or it will
notify the user to increase the size and preserves the latest file till we
restart the flow?? If so what happens if the size is keep on increasing
with respect to 500MB, it will save archive based on count or only latest
archive throughtout nifi is in running status??

Many thanks

On Thu, Jan 19, 2017 at 12:47 PM, Koji Kawamura 
wrote:

> Hi Prabhu,
>
> Thank you for the suggestion.
>
> Keeping latest N archives is nice, it's simple :)
>
> The max.time and max.storage have other benefit and since already
> released, we should keep existing behavior with these settings, too.
> I've created a JIRA to add archive.max.count property.
> https://issues.apache.org/jira/browse/NIFI-3373
>
> Thanks,
> Koji
>
> On Thu, Jan 19, 2017 at 2:21 PM, prabhu Mahendran
>  wrote:
> > Hi Koji,
> >
> >
> > Thanks for your reply,
> >
> > Yes. Solution B may meet as I required. Currently if the storage size
> meets,
> > complete folder is getting deleted and the new flow is not tracked in the
> > archive folder. This behavior is the drawback here. I need atleast last
> > workflow to be saved in the archive folder and notify the user to
> increase
> > the size. At the same time till nifi restarts, atleast last complete
> > workflow should be backed up.
> >
> >
> > My another suggestion is as follows:
> >
> >
> > Regardless of the max.time and max.storage property, Can we have only few
> > files in archive(consider only 10 files). Each action from the nifi
> canvas
> > should be tracked here, if the flow.xml.gz archive files count reaches it
> > should delete the old first file and save the latest file, so that the
> count
> > 10 is maintained. Here we can maintain the workflow properly and backup
> is
> > also achieved without confusing with max.time and max.storage. Only case
> is
> > that the disk size exceeds, we should notify user about this.
> >
> >
> > Many thanks.
> >
> >
> > On Thu, Jan 19, 2017 at 6:36 AM, Koji Kawamura 
> > wrote:
> >>
> >> Hi Prabhu,
> >>
> >> Thanks for sharing your experience with flow file archiving.
> >> The case that a single flow.xml.gz file size exceeds
> >> archive.max.storage was not considered well when I implemented
> >> NIFI-2145.
> >>
> >> By looking at the code, it currently works as follows:
> >> 1. The original conf/flow.xml.gz (> 1MB) is archived to conf/archive
> >> 2. NiFi checks if there's any expired archive files, and delete it if
> any
> >> 3. NiFi checks if the total size of all archived files, then delete
> >> the oldest archive. Keep doing this until the total size becomes less
> >> than or equal to the configured archive.max.storage.
> >>
> >> In your case, at step 3, the newly created archive is deleted, because
> >> its size was grater than archive.max.storage.
> >> In this case, NiFi only logs INFO level message, and it's hard to know
> >> what happened from user, as you reported.
> >>
> >> I'm going to create a JIRA for this, and fix current behavior by
> >> either one of following solutions:
> >>
> >> A. treat archive.max.storage as a HARD limit. If the original
> >> flow.xml.gz exceeds configured archive.max.storage in size, then throw
> >> an IOException, which results a WAR level log message "Unable to
> >> archive flow configuration as requested due to ...".
> >>
> >> B. treat archive.max.storage as a SOFT limit. By not including the
> >> newly created archive file at the step 2 and 3 above, so that it can
> >> stay there. Maybe a WAR level log message should be logged.
> >>
> >> For greater user experience, I'd prefer solution B, so that it can be
> >> archived even the flow.xml.gz exceeds archive storage size, since it
> >> was able to be written to disk, which means the physical disk had
> >> enough space.
> >>
> >> How do you think?
> >>
> >> Thanks!
> >> Koji
> >>
> >> On Wed, Jan 18, 2017 at 3:27 PM, prabhu Mahendran
> >>  wrote:
> >> > i have check below properties used for the backup operations in
> >> > Nifi-1.0.0
> >> > with respect to JIRA.
> >> >
> >> > https://issues.apache.org/jira/browse/NIFI-2145
> >> >
> >> > nifi.flow.configuration.archive.max.time=1 hours
> >> > nifi.flow.configuration.archive.max.storage=1 MB
> >> >
> >> > Since we have two backup operations first one is "conf/flow.xml.gz"
> and
> >> > "conf/archive/flow.xml.gz"
> >> >
> >> > I have saved archive workflows(conf/archive/flow.xml.gz) as per
> hours in
> >> > "max.time" property.
> >> >
> >> > At particular time i have reached "1 

How to print flow

2017-01-19 Thread Alessio Palma
Hello all,
does anybody found a way to print workflow?

I.E: tool to convert the flow into another format which is readable by other 
software with printing support.