RE: [EXT] Re: ReplaceText Flow File Processing Count

2018-05-08 Thread Peter Wicks (pwicks)
https://github.com/apache/nifi/pull/2687


-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Friday, May 04, 2018 21:19
To: dev@nifi.apache.org
Subject: [EXT] Re: ReplaceText Flow File Processing Count

Bryan's guess on the history is probably right but more to the point with what 
we have available these days with the record processors and so on I think we 
should just change it back to one.  Peter's statement on user expectation I 
agree with for sure.  Any chance you want to file that JIRA/PR peter?

On Fri, May 4, 2018 at 9:13 AM, Bryan Bende  wrote:
> I don't know the history of this particular processor, but I think the 
> purpose of the session.get() with batches is similar to the concept of 
> @SupportsBatching. Basically both of them should have better 
> performance because you are handling multiple flow files in a single 
> session. The supports batching concept is a bit more flexible as it is 
> configurable by the user, where as this case is hard-coded into the 
> processor.
>
> I suppose if there is some reason why you need to process 1 flow file 
> at a time, you could set the back-pressure threshold to 1 on the queue 
> leading into ReplaceText.
>
> On Fri, May 4, 2018 at 3:50 AM, Peter Wicks (pwicks)  
> wrote:
>> Had a user notice today that a ReplaceText processor, scheduled to run every 
>> 20 minutes, had processed all 14 files in queue at once. I looked at the 
>> code and see that ReplaceText does not do a standard session.get, but 
>> instead calls:
>>
>> final List flowFiles = 
>> session.get(FlowFileFilters.newSizeBasedFilter(1, DataUnit.MB, 100));
>>
>> Was there a design reason behind this? To us it was just really confusing 
>> that we didn't have full control over how quickly FlowFile's move through 
>> this processor.
>>
>> Thanks,
>>   Peter


Re: Nifi Clustering not working

2018-05-08 Thread Peter Wilcsinszky
Jonathan,

I think the issue is with how you apply the overrides in the properties
file. I've removed the trailing comments from the overriden lines and it
seems to be working.

Peter

On Fri, Apr 20, 2018 at 11:01 AM, Peter Wilcsinszky <
peterwilcsins...@gmail.com> wrote:

> Hi,
>
> I've tested your statefulset and it seems it doesn't even try to connect
> to zookeeper to form a cluster, although the config looks good. I'm still
> new to this, but will dig deeper later.
>
> Peter
>
> On Tue, Apr 17, 2018 at 6:55 PM, Jonathan Kosgei <
> jonat...@saharacluster.com> wrote:
>
>> Thank you for the tips.
>>
>>
>>
>> I've tried setting the fqdn as the hosts and made sure each was pingeable
>> from all the other nodes with no luck.
>>
>>
>>
>> In particular I see this error a lot;
>>
>>
>>
>>
>>
>> Unable to locate group with id 'd4779129-0162-1000-982f-9436d01eb801'.
>>
>>
>>
>> I'm wondering if I might have a port closed that should be open, is there
>> any list of all official ports? Looking in my logs the bootstrap port seems
>> to be different each time and I wonder if maybe they need to be accessible
>> for each pod?
>>
>>
>>
>> Here's a trimmed down version of my logs (not in DEBUG mode);
>>
>>
>>
>> nifi-app.log
>>
>> https://gist.github.com/jonathan-kosgei/81936b7e59a563f55111bfe5be06e8bc
>>
>>
>>
>> nifi-user.log
>>
>> https://gist.github.com/jonathan-kosgei/19ce9929a764a40ac74a84fdcac2b3e2
>>
>>
>>
>> nifi-bootrap.log
>>
>> https://gist.github.com/jonathan-kosgei/e7396f558cbcd9a20cf6c0dca72c2728
>>
>>
>>
>> My Statefulset and Services
>>
>> https://gist.github.com/jonathan-kosgei/49a628edab109ba39da8598caf0e4f1e
>>
>>
>>
>> Thanks!
>>
>>
>


Re: NiFi REST APIs : How to create flows through 'nifi-api'

2018-05-08 Thread Mike Thomsen
Brajendra,

The NiFi API is typically called from external clients, not from within a
flow. With that said, as far as I understand your use case I don't think
it's going to work out that well for you. What you seem to want to do here
is programatically create new processors to handle new situations as they
come in, but you don't need to do that. For example, GetMongo is already
parameterized w/ expression language and can take input queries from
upstream processors in 1.6. So what you'd be better off doing is building a
flow with upstream processors that tell GetMongo where to go (db +
collection) and what query to use using that approach instead of using the
REST APIs to tweak GetMongo.

WRT GetFile, my best suggestion would be to put a min secs of 5-10 in the
minimum age field and then just build in one processor. You're going to
have to figure out how to get files to a path that NiFi can see anyway, so
that's where your focus should be on that.

For you, starting with a message queue processor would probably be a better
approach. You might also need to get your hands a little dirty there and
add ExecuteScript or EvaluateJSONPath to read the incoming message and pull
out some properties from the message.

On Tue, May 8, 2018 at 1:10 AM Brajendra Mishra <
brajendra_mis...@persistent.com> wrote:

> Hi Mike and Sivaprasanna,
>
> Could you please provide few examples where I can create Flows, process
> groups, processors, connector services and input and output port through
> 'nifi-api' REST APIs?
>
> I need your suggestion on one approach, where I want to create one flow
> and create processors (through nifi-api) based on specific input requests
> (like if input is 'abc' then create getFile processor and if input is 'xyz'
> then create 'GetMongoDB' processor).
>
> In above scenario which approach should be more feasible (with respect to
> NiFi performance). I have figured out following 2 approaches:
>
>
>   1.  All inputs should be handled by one 'router' processor and based on
> input it redirects and create specific processor.
>   2.  Input should be specific at time of given inputs and flow will be
> individual for each processor.
>
> Please suggest.
>
> Brajendra Mishra
> Persistent Systems Ltd.
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>


Cluster behavior when heavily loaded

2018-05-08 Thread Mark Bean
We have a 6-node cluster using external ZooKeeper. It is heavily loaded,
and we are attempting to tune some of the properties to alleviate some
observed issues. By "heavily loaded" I mean the graph is large (approx.
3,000 processors) and there is a lot of data in process (approx. 2M
flowfiles/120GB queued)

One symptom we see is that changes to the graph are not replicated to other
nodes, and the Node(s) are subsequently disconnected from the cluster. In
one example, we see in the nifi-app.log that the node is disconnected due
to "failed to process request PUT
/nifi-api/connection/976a60b5d-3c4e-3bbb-8fbe-4790f3ecb147"

The following properties are set in nifi.properties:

nifi.cluster.node.protocol.threads=30
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=60 sec
nifi.cluster.node.read.timeout=60 sec
nifi.cluster.node.max.concurrent.requests=500
nifi.cluster.node.request.replication.claim.timeout=20 secs

nifi.zookeeper.connect.timeout=30 secs
nifi.zookeeper.session.timeout=30 secs

Some of the (timeout) values are set fairly high due to the heavily loaded
system; we allow a longer time to complete tasks. Are there interrelated
properties which a long timeout might actually become detrimental? Are
there other properties we should look at more closely?

Thanks,
Mark


Grok Reader, Extract and the default patterns

2018-05-08 Thread Otto Fowler
I’m working on upgrading java-grok to the new 0.1.9 release.  While going
through the GrokReader the the ExtractGrok components I noticed that they
differ in a very important way grok wise.
The reader loads the default patterns ( which are a copy of the ubiquitous
default patterns in java-grok itself.  The older ( I assume ) ExtractGrok
processor does not.

I’m wondering if there is a reason for this going back to the creation of
the processor.  It seems to me it would be better for the ExtractGrok
processor and the GrokReader to work similarly with regards
to the default patterns.   At the moment, there is only one pattern file
allowed to be specified with ExtractGrok ( I think there is PR for multiple
pattern files ).  Besides consistency between the two components,
it seems like it would be better for users if they didn’t have to waste or
merge pattern files to get the common patterns.

I apologize in advance if I am missing something here.

Would anyone object to setting the default patterns as we do in the
GrokReader?

ottO


Re: Grok Reader, Extract and the default patterns

2018-05-08 Thread Joe Witt
You're probably not missing anything.  ExtractGrok came before the
record reader/writer enlightenment phase.

If you think ExtractGrok is useful and want to improve it as you
suggest I think you're good to go provided you do so in a way that
doesn't break existing flows (change their behavior unless they opt in
so to speak).  Is that feasible for your idea?

Thanks

On Tue, May 8, 2018 at 9:11 PM, Otto Fowler  wrote:
> I’m working on upgrading java-grok to the new 0.1.9 release.  While going
> through the GrokReader the the ExtractGrok components I noticed that they
> differ in a very important way grok wise.
> The reader loads the default patterns ( which are a copy of the ubiquitous
> default patterns in java-grok itself.  The older ( I assume ) ExtractGrok
> processor does not.
>
> I’m wondering if there is a reason for this going back to the creation of
> the processor.  It seems to me it would be better for the ExtractGrok
> processor and the GrokReader to work similarly with regards
> to the default patterns.   At the moment, there is only one pattern file
> allowed to be specified with ExtractGrok ( I think there is PR for multiple
> pattern files ).  Besides consistency between the two components,
> it seems like it would be better for users if they didn’t have to waste or
> merge pattern files to get the common patterns.
>
> I apologize in advance if I am missing something here.
>
> Would anyone object to setting the default patterns as we do in the
> GrokReader?
>
> ottO


Re: Grok Reader, Extract and the default patterns

2018-05-08 Thread Otto Fowler
I think so, the way that java-grok works ( refactored for 0.1.9 ) is that
you register
sets of patterns with the compiler, and the compiler produces the Grok
object.

When registering things with the same name, the last one in wins, so I
think if an existing
user had put a pattern file in the configuration with patterns named the
same as the
defaults, then they would just replace them in the map.



On May 8, 2018 at 21:24:35, Joe Witt (joe.w...@gmail.com) wrote:

You're probably not missing anything. ExtractGrok came before the
record reader/writer enlightenment phase.

If you think ExtractGrok is useful and want to improve it as you
suggest I think you're good to go provided you do so in a way that
doesn't break existing flows (change their behavior unless they opt in
so to speak). Is that feasible for your idea?

Thanks

On Tue, May 8, 2018 at 9:11 PM, Otto Fowler 
wrote:
> I’m working on upgrading java-grok to the new 0.1.9 release. While going
> through the GrokReader the the ExtractGrok components I noticed that they
> differ in a very important way grok wise.
> The reader loads the default patterns ( which are a copy of the
ubiquitous
> default patterns in java-grok itself. The older ( I assume ) ExtractGrok
> processor does not.
>
> I’m wondering if there is a reason for this going back to the creation of
> the processor. It seems to me it would be better for the ExtractGrok
> processor and the GrokReader to work similarly with regards
> to the default patterns. At the moment, there is only one pattern file
> allowed to be specified with ExtractGrok ( I think there is PR for
multiple
> pattern files ). Besides consistency between the two components,
> it seems like it would be better for users if they didn’t have to waste
or
> merge pattern files to get the common patterns.
>
> I apologize in advance if I am missing something here.
>
> Would anyone object to setting the default patterns as we do in the
> GrokReader?
>
> ottO


Re: Grok Reader, Extract and the default patterns

2018-05-08 Thread Otto Fowler
I would add a test for putting in the defaults as a file when they are
added by the processor as well.

=or=

I could expose a new property to turn it on.  But really, by rule of least
surprise, I would expect the core groks to always
exist and to be able to overwrite if necessary.  We do that in metron.

There is something that feels wrong about possible duplication between the
reader serialization service nar and standard processors (
GrokExpressionValidator for example )
but that is another kettle of fish.


On May 8, 2018 at 21:33:59, Otto Fowler (ottobackwa...@gmail.com) wrote:

I think so, the way that java-grok works ( refactored for 0.1.9 ) is that
you register
sets of patterns with the compiler, and the compiler produces the Grok
object.

When registering things with the same name, the last one in wins, so I
think if an existing
user had put a pattern file in the configuration with patterns named the
same as the
defaults, then they would just replace them in the map.



On May 8, 2018 at 21:24:35, Joe Witt (joe.w...@gmail.com) wrote:

You're probably not missing anything. ExtractGrok came before the
record reader/writer enlightenment phase.

If you think ExtractGrok is useful and want to improve it as you
suggest I think you're good to go provided you do so in a way that
doesn't break existing flows (change their behavior unless they opt in
so to speak). Is that feasible for your idea?

Thanks

On Tue, May 8, 2018 at 9:11 PM, Otto Fowler  wrote:
> I’m working on upgrading java-grok to the new 0.1.9 release. While going
> through the GrokReader the the ExtractGrok components I noticed that they
> differ in a very important way grok wise.
> The reader loads the default patterns ( which are a copy of the ubiquitous
> default patterns in java-grok itself. The older ( I assume ) ExtractGrok
> processor does not.
>
> I’m wondering if there is a reason for this going back to the creation of
> the processor. It seems to me it would be better for the ExtractGrok
> processor and the GrokReader to work similarly with regards
> to the default patterns. At the moment, there is only one pattern file
> allowed to be specified with ExtractGrok ( I think there is PR for
multiple
> pattern files ). Besides consistency between the two components,
> it seems like it would be better for users if they didn’t have to waste or
> merge pattern files to get the common patterns.
>
> I apologize in advance if I am missing something here.
>
> Would anyone object to setting the default patterns as we do in the
> GrokReader?
>
> ottO


Re: Grok Reader, Extract and the default patterns

2018-05-08 Thread Otto Fowler
It is also possible that Extract grok should be @depricated and noted as
such in it’s documentation, with the GrokReader being refactored to have
all the same
levers to tune.

But I would suggest that that would or should be done after the upgrade to
0.1.9 ( latest ), and after the follow on and current pr’s about  empty
capture returns and multiple pattern files.
I would guess that the outstanding Jira’s would also be evaluated before
depreciation.


On May 8, 2018 at 21:37:41, Otto Fowler (ottobackwa...@gmail.com) wrote:

I would add a test for putting in the defaults as a file when they are
added by the processor as well.

=or=

I could expose a new property to turn it on.  But really, by rule of least
surprise, I would expect the core groks to always
exist and to be able to overwrite if necessary.  We do that in metron.

There is something that feels wrong about possible duplication between the
reader serialization service nar and standard processors (
GrokExpressionValidator for example )
but that is another kettle of fish.


On May 8, 2018 at 21:33:59, Otto Fowler (ottobackwa...@gmail.com) wrote:

I think so, the way that java-grok works ( refactored for 0.1.9 ) is that
you register
sets of patterns with the compiler, and the compiler produces the Grok
object.

When registering things with the same name, the last one in wins, so I
think if an existing
user had put a pattern file in the configuration with patterns named the
same as the
defaults, then they would just replace them in the map.



On May 8, 2018 at 21:24:35, Joe Witt (joe.w...@gmail.com) wrote:

You're probably not missing anything. ExtractGrok came before the
record reader/writer enlightenment phase.

If you think ExtractGrok is useful and want to improve it as you
suggest I think you're good to go provided you do so in a way that
doesn't break existing flows (change their behavior unless they opt in
so to speak). Is that feasible for your idea?

Thanks

On Tue, May 8, 2018 at 9:11 PM, Otto Fowler  wrote:
> I’m working on upgrading java-grok to the new 0.1.9 release. While going
> through the GrokReader the the ExtractGrok components I noticed that they
> differ in a very important way grok wise.
> The reader loads the default patterns ( which are a copy of the ubiquitous
> default patterns in java-grok itself. The older ( I assume ) ExtractGrok
> processor does not.
>
> I’m wondering if there is a reason for this going back to the creation of
> the processor. It seems to me it would be better for the ExtractGrok
> processor and the GrokReader to work similarly with regards
> to the default patterns. At the moment, there is only one pattern file
> allowed to be specified with ExtractGrok ( I think there is PR for
multiple
> pattern files ). Besides consistency between the two components,
> it seems like it would be better for users if they didn’t have to waste or
> merge pattern files to get the common patterns.
>
> I apologize in advance if I am missing something here.
>
> Would anyone object to setting the default patterns as we do in the
> GrokReader?
>
> ottO