[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

2018-05-02 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1000
  
I tested this in full dev and the results were somewhat inconsistent.  I 
listened on the enrichments topic with the kafka-console-consumer tool in one 
window:
```
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh -z node1:2181 
--topic enrichments
```
While repeatedly running this command in another:
```
KAFKA_FIND('enrichments', m -> MAP_GET('source.type', m) == 'snort')
```
About 25-50% of the time the Stellar shell returned `[]` and the other 
times it would return a snort message as expected.

How long will this command listen until it times out (or is it based on 
number of messages read)?  Sometimes it returned an empty array immediately.  
Is this configurable?  


---


[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

2018-05-02 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1000
  
I should add that I have both bro and snort parser topologies running.


---


[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

2018-05-03 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1000
  
Thanks for taking it for a test drive.  I think all your observations are 
explainable, but they all point out usability issues that I think I can improve 
on.

 1. Offsets

`KAFKA_FIND` 'sticks' on its consumer offset.  It operates more like 
`KAFKA_GET` than `KAFKA_TAIL`.  This is how I described it in the docs.

> Finds messages that satisfy a given filter expression. Subsequent calls 
will continue retrieving messages sequentially from the original offset.

When you first run `KAFKA_FIND`, its consumer offset will not be set.  It 
will pick-up from the end of the topic.  When you run it again in the same 
session, it will continue filtering from those same offsets, rather than going 
to the end of the topic.  

The `kafka-console-consumer` tool always seeks to the end when it is run.  
In your test its likely that `kafka-console-consumer` and `KAFKA_FIND` are at 
completely different offsets as you try to compare the two.

I had actually already been working on a version of this that always seeks 
to the end and so behaves more like `KAFKA_TAIL` and `kafka-console-consumer`.

Per the use case I described in the PR, I think 'seek to end' makes more 
sense.  You make a change on a live stream and want to see the immediate 
results.  If `KAFKA_TAIL` 'sticks' on an earlier offset, you're not going to 
see the most recent messages, which can be confusing for the user.

 2. Timeouts

> How long will this command listen until it times out (or is it based on 
number of messages read)? ...  Is this configurable?

The command will poll for up to 5 seconds, by default.  This can be 
adjusted by defining a global property `stellar.kafka.max.wait`.

> Sometimes it returned an empty array immediately. 

In this case, it probably pulled in messages from the topic, none of those 
messages matched your filter, and so returned an empty array to you.

I probably need to look at the timeout logic under these conditions.  It 
should probably 'try harder' to find matching messages and not return 
immediately.  I'll take a look at this and see if it can be improved.







---


[GitHub] metron issue #1000: METRON-1533 Create KAFKA_FIND Stellar Function

2018-05-22 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1000
  
I made a bunch of enhancements based on the feedback I outlined above.  I 
am in the process of breaking that work out into multiple PRs so that it can be 
reviewed more easily.


---