Logstash as available parser

2019-03-28 Thread stephane.davy
Hello all,

I'm new to Metron, my installation has been finished this morning, and I must 
admit that it looks very exciting. I've a question regarding parsers. When I 
add a new telemetry source, the "parser" list is longer than what it's 
documented. More precisely, there is a "logstash" parser that we are very 
interested in as we already use Elasticsearch and have a lot of ready to use 
logstash configuration.

Is there any documentation anywhere? I cannot find anything, and the even the 
source code says nothing.

Thanks a lot,

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Logstash as available parser

2019-03-29 Thread stephane.davy
Hello Mike,

Thanks for your reply. By the way, do you mean that I just have to copy / paste 
my Logstash “filter” configuration and it would work?

Stéphane


From: Michael Miklavcic [mailto:michael.miklav...@gmail.com]
Sent: Thursday, March 28, 2019 19:14
To: user@metron.apache.org
Subject: Re: Logstash as available parser

Hi Stéphane,

Welcome, and thanks for the interest in the project! The Logstash parser you 
found is one of the original parsers we inherited from the original 
open-sourced OpenSoc project. We don't have any documentation specific to that 
parser (or unit tests as I'm looking at this), but it's actually not too 
complicated. The parser can basically be summed up as the following steps:

  1.  parse logstash messages as Json
  2.  remove meta-fields from the message: @version,type,host,tags
  3.  rename some fields, e.g. src_ip -> ip_src_addr
  4.  set a normalized timestamp field (millis since epoch) named "timestamp" 
taken from the @timestamp logstash field
That's pretty much it - there's currently no configuration required for this 
parser type. I'd run some sample data through the parser to try it out.

Best,
Mike Miklavcic


On Thu, Mar 28, 2019 at 8:50 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

I’m new to Metron, my installation has been finished this morning, and I must 
admit that it looks very exciting. I’ve a question regarding parsers. When I 
add a new telemetry source, the “parser” list is longer than what it’s 
documented. More precisely, there is a “logstash” parser that we are very 
interested in as we already use Elasticsearch and have a lot of ready to use 
logstash configuration.

Is there any documentation anywhere? I cannot find anything, and the even the 
source code says nothing.

Thanks a lot,

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Not seeing feeds in metron -alerts ui

2019-04-04 Thread stephane.davy
Hello,

How many ES data nodes do you have? Given the following setting:
gateway:
  recover_after_data_nodes: 3

you must have at least 3 living data nodes to have a working ES cluster. I 
faced this issue last week after my install.


Stéphane


From: Meenakshi.S [mailto:meenakshi.subraman...@inspirisys.com]
Sent: Thursday, April 04, 2019 14:44
To: user@metron.apache.org
Subject: RE: Not seeing feeds in metron -alerts ui

Hi

Elastic search health is red in kibana and we are getting cluster block 
exception elasticsearch.

Kibana dashboard is not up .

These are my config details It is a single node installation

Regards,
Meenakshi

ElasticSearch.yml

cluster:
  name:   metron
  routing:
allocation.node_concurrent_recoveries: 4
allocation.disk.watermark.low: .97
allocation.disk.threshold_enabled: true
allocation.disk.watermark.high: 0.99

discovery:
  zen:
ping:
  unicast:
hosts: ["10.3.1.67"]

node:
  data: true
  master: true
  name: node1
path:
  data: "/opt/lmm/es_data"

http:
  port: 9200-9300
  cors.enabled: "false"


transport:
  tcp:
port: 9300-9400

gateway:
  recover_after_data_nodes: 3
  recover_after_time: 15m
  expected_data_nodes: 0

# 
https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html
indices:
  store.throttle.type: none
  memory:
   index_buffer_size: 10%
  fielddata:
   cache.size: 25%

bootstrap:
  memory_lock: true
  system_call_filter: false

thread_pool:
  bulk:
queue_size: 3000
  index:
queue_size: 1000

discovery.zen.ping_timeout: 5s
discovery.zen.fd.ping_interval: 15s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5
discovery.zen.minimum_master_nodes: 1

network.host: [ _local_, _site_ ]
network.publish_host: []


Error

{"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / 
initialized];"}],"type":"cluster_block_exception","reason":"blocked by: 
[SERVICE_UNAVAILABLE/1/state not recovered / initialized];"},"status":503}



From: Michael Miklavcic [mailto:michael.miklav...@gmail.com]
Sent: 03 April 2019 20:15
To: user@metron.apache.org; meenakshi.subraman...@inspirisys.com
Subject: Re: Not seeing feeds in metron -alerts ui

I think I need a bit more context. Are you saying it makes it to indexing and 
then never makes it to ES or Solr? Are you running fulldev or another type of 
manual installation? Which index tool are you using, es or solr?

On Wed, Apr 3, 2019, 5:26 AM Meenakshi.S 
mailto:meenakshi.subraman...@inspirisys.com>>
 wrote:
Hi Team,

I am able to insert snort related feeds to metron .

I am able to see the feed till the indexing kakfka topic . After that I am not 
able to trace it . Any help is highly appreciated


Regards,
Meenakshi

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Metron-REST is always stopping

2019-04-04 Thread stephane.davy
Hello all,

I've installed Metron last week and everything was working correctly. I'm 
currently playing with and trying to understand how it works. After a few hours 
spent on the Management GUI, I started to have some disconnections and finally 
I'm no longer able to login. I can see that actually the metron rest is 
stopped. Nevertheless, I unable to start it anymore. It is first reported as 
"started" in Ambari, and then goes to "stopped" roughly one minute later.

Regarding the logs themselves:

-  The last lines of /var/log/metron/metron-rest.log are:



5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'zookeeper.sync.time.ms' was supplied but isn't a known config.

5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'session.timeout.ms' was supplied but isn't a known config.

5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'auto.offset.reset' was supplied but isn't a known config.

5360 [Atlas Notifier 0] INFO  o.a.k.c.u.AppInfoParser - Kafka version : 
1.0.0.2.6.5.1050-37

5360 [Atlas Notifier 0] INFO  o.a.k.c.u.AppInfoParser - Kafka commitId : 
2ff1ddae17fb8503



-  The last lines in /var/log/ambari-agent/ambary-agent.log are:

INFO 2019-04-04 16:41:57,905 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:04,790 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:15,629 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:15,640 Controller.py:410 - Adding recovery command START 
for component METRON_REST


Is there any other good place to find some logs?

Please note that:

-  Mariadb is up and running

-  Filesystems are not full

-  All the other Hortonworks services are up and running


It actually  started to go weird when I stopped the bro, snort, yaf,... sensors 
which I currently don't need.

Thanks for your help,

Stéphane


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Metron-REST is always stopping

2019-04-04 Thread stephane.davy
Hi Simon,

Thanks for your answer

I have faced the request issue during my install and as such I installed the 
correct version. Metron GUI is not working, but actually I realized that even 
if is reported as stopped, the process was still there but in a bad shape. I 
killed the process, start it again from Ambari and it works.

Best regards,

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Thursday, April 04, 2019 16:59
To: user@metron.apache.org
Subject: Re: Metron-REST is always stopping

Did you check to see if it was listening? Sometimes this can misreport in 
ambari if you have an incorrect version of the python requests library 
installed.
Simon

On 4 Apr 2019, at 22:47, 
mailto:stephane.d...@orange.com>> 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

I’ve installed Metron last week and everything was working correctly. I’m 
currently playing with and trying to understand how it works. After a few hours 
spent on the Management GUI, I started to have some disconnections and finally 
I’m no longer able to login. I can see that actually the metron rest is 
stopped. Nevertheless, I unable to start it anymore. It is first reported as 
“started” in Ambari, and then goes to “stopped” roughly one minute later.

Regarding the logs themselves:

-  The last lines of /var/log/metron/metron-rest.log are:



5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'zookeeper.sync.time.ms' was supplied but isn't a known config.

5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'session.timeout.ms' was supplied but isn't a known config.

5359 [Atlas Notifier 0] WARN  o.a.k.c.p.ProducerConfig - The configuration 
'auto.offset.reset' was supplied but isn't a known config.

5360 [Atlas Notifier 0] INFO  o.a.k.c.u.AppInfoParser - Kafka version : 
1.0.0.2.6.5.1050-37

5360 [Atlas Notifier 0] INFO  o.a.k.c.u.AppInfoParser - Kafka commitId : 
2ff1ddae17fb8503



-  The last lines in /var/log/ambari-agent/ambary-agent.log are:

INFO 2019-04-04 16:41:57,905 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:04,790 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:15,629 RecoveryManager.py:255 - METRON_REST needs 
recovery, desired = STARTED, and current = INSTALLED.

INFO 2019-04-04 16:42:15,640 Controller.py:410 - Adding recovery command START 
for component METRON_REST


Is there any other good place to find some logs?

Please note that:

-  Mariadb is up and running

-  Filesystems are not full

-  All the other Hortonworks services are up and running


It actually  started to go weird when I stopped the bro, snort, yaf,… sensors 
which I currently don’t need.

Thanks for your help,

Stéphane


_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Metron concept

2019-04-07 Thread stephane.davy
Hello all,

There is one my point that isn't clear for me. When sending data into Metron, 
are all the events all indexed sent to Elastic and / or HDFS, or only the 
events that trigger a triage rule?

For now I'm trying to send some FW logs in Metron, I feed a Kafka topic with 
Nifi, I can see that the topic has data thanks to Kafka CLI, but nothing more 
happens after I've configured a new source from UI management...

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Metron concept

2019-04-08 Thread stephane.davy
Hello Nick,

Thanks for your answer. I went through this post and see that all my events 
should go in Elastic, which is what I want, but which it isn’t what I get ☹

I have setup the following basic setup:

-  New telemetry with grok parser (validated in UI with sample) and a 
kafka topic => the topic didn’t exist before, and it is created automatically 
as I can see with the kafka-topics.sh CLI utility

-  A simple Nifi flow to push data in this topic => I can see some data 
in the topic with the kafka-console-consumer.sh CLI utility.

But I have the feeling that my topology never consume Kafka messages. The Storm 
UI shows “0” figure nearly everywhere in my topology, and the Elastic index is 
not created (_cat/indices). I see also nothing in the “indexing” Kafka topic.

But I see no error message, I don’t really know how to go on…

Does anybody have a suggestion for me? I guess I’m not the first one with kind 
of issue but I cannot find any case close to mine.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Monday, April 08, 2019 15:17
To: user@metron.apache.org
Subject: Re: Metron concept

All events are indexed by default.

See if this guide helps you any.  
https://cwiki.apache.org/confluence/display/METRON/Adding+a+New+Telemetry+Data+Source

On Mon, Apr 8, 2019 at 2:49 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

There is one my point that isn’t clear for me. When sending data into Metron, 
are all the events all indexed sent to Elastic and / or HDFS, or only the 
events that trigger a triage rule?

For now I’m trying to send some FW logs in Metron, I feed a Kafka topic with 
Nifi, I can see that the topic has data thanks to Kafka CLI, but nothing more 
happens after I’ve configured a new source from UI management…

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Metron concept

2019-04-08 Thread stephane.davy
Hello Simon,

I send just one line at a time, and the line has been validated in the Metron 
UI. I see no message in the topology logs. I switched to DEBUG mode, and I can 
see the following sequence again and again:

2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Sending coordinator request for 
group forti1_parser to broker r-petya:6667 (id: 1011 rack: /default-rack)
2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Received group coordinator response 
ClientResponse(receivedTimeMs=1554734150463, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@35437dce,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=61518,client_id=consumer-1},
 body={group_id=forti1_parser}), createdTimeMs=1554734150463, 
sendTimeMs=1554734150463), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2019-04-08 16:35:50.562 o.a.k.c.NetworkClient Thread-14-kafkaSpout-executor[4 
4] [DEBUG] Sending metadata request {topics=[forti1]} to node 1011
2019-04-08 16:35:50.562 o.a.k.c.Metadata Thread-14-kafkaSpout-executor[4 4] 
[DEBUG] Updated cluster metadata version 30761 to Cluster(nodes = [r-petya:6667 
(id: 1011 rack: /default-rack), r-jigsaw:6667 (id: 1012 rack: /default-rack), 
r-wannacry.rd.francetelecom.fr:6667 (id: 1010 rack: /default-rack)], partitions 
= [Partition(topic = forti1, partition = 0, leader = 1012, replicas = [1012,], 
isr = [1012,]])


Is it normal to have 
“responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})” in the 
response?

Thanks,

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Monday, April 08, 2019 16:29
To: user@metron.apache.org
Subject: Re: Metron concept

Are you seeing events on the enrichments topic, and if so, are they getting to 
indexing? Any messages in the storm logs for these topologies?

Are you also certain the parser is correct, and there are no invalid or error 
messages being sent to the error index?

Simon

On Mon, 8 Apr 2019 at 15:26, 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. I went through this post and see that all my events 
should go in Elastic, which is what I want, but which it isn’t what I get ☹

I have setup the following basic setup:

-  New telemetry with grok parser (validated in UI with sample) and a 
kafka topic => the topic didn’t exist before, and it is created automatically 
as I can see with the kafka-topics.sh CLI utility

-  A simple Nifi flow to push data in this topic => I can see some data 
in the topic with the kafka-console-consumer.sh CLI utility.

But I have the feeling that my topology never consume Kafka messages. The Storm 
UI shows “0” figure nearly everywhere in my topology, and the Elastic index is 
not created (_cat/indices). I see also nothing in the “indexing” Kafka topic.

But I see no error message, I don’t really know how to go on…

Does anybody have a suggestion for me? I guess I’m not the first one with kind 
of issue but I cannot find any case close to mine.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Monday, April 08, 2019 15:17
To: user@metron.apache.org
Subject: Re: Metron concept

All events are indexed by default.

See if this guide helps you any.  
https://cwiki.apache.org/confluence/display/METRON/Adding+a+New+Telemetry+Data+Source

On Mon, Apr 8, 2019 at 2:49 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

There is one my point that isn’t clear for me. When sending data into Metron, 
are all the events all indexed sent to Elastic and / or HDFS, or only the 
events that trigger a triage rule?

For now I’m trying to send some FW logs in Metron, I feed a Kafka topic with 
Nifi, I can see that the topic has data thanks to Kafka CLI, but nothing more 
happens after I’ve configured a new source from UI management…

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be alte

RE: Metron concept

2019-04-08 Thread stephane.davy
Well, I realize that the console-consumer works with the—zookeeper option, 
which is the “old consumer”, while it doesn’t work when I specify 
–bootstrap-server, which is the “new consumer” way. So, it looks like a Kafka 
issue…


From: DAVY Stephane OBS/CSO
Sent: Monday, April 08, 2019 16:45
To: 'user@metron.apache.org'
Subject: RE: Metron concept

Hello Simon,

I send just one line at a time, and the line has been validated in the Metron 
UI. I see no message in the topology logs. I switched to DEBUG mode, and I can 
see the following sequence again and again:

2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Sending coordinator request for 
group forti1_parser to broker r-petya:6667 (id: 1011 rack: /default-rack)
2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Received group coordinator response 
ClientResponse(receivedTimeMs=1554734150463, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@35437dce,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=61518,client_id=consumer-1},
 body={group_id=forti1_parser}), createdTimeMs=1554734150463, 
sendTimeMs=1554734150463), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2019-04-08 16:35:50.562 o.a.k.c.NetworkClient Thread-14-kafkaSpout-executor[4 
4] [DEBUG] Sending metadata request {topics=[forti1]} to node 1011
2019-04-08 16:35:50.562 o.a.k.c.Metadata Thread-14-kafkaSpout-executor[4 4] 
[DEBUG] Updated cluster metadata version 30761 to Cluster(nodes = [r-petya:6667 
(id: 1011 rack: /default-rack), r-jigsaw:6667 (id: 1012 rack: /default-rack), 
r-wannacry.rd.francetelecom.fr:6667 (id: 1010 rack: /default-rack)], partitions 
= [Partition(topic = forti1, partition = 0, leader = 1012, replicas = [1012,], 
isr = [1012,]])


Is it normal to have 
“responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})” in the 
response?

Thanks,

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Monday, April 08, 2019 16:29
To: user@metron.apache.org
Subject: Re: Metron concept

Are you seeing events on the enrichments topic, and if so, are they getting to 
indexing? Any messages in the storm logs for these topologies?

Are you also certain the parser is correct, and there are no invalid or error 
messages being sent to the error index?

Simon

On Mon, 8 Apr 2019 at 15:26, 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. I went through this post and see that all my events 
should go in Elastic, which is what I want, but which it isn’t what I get ☹

I have setup the following basic setup:

-  New telemetry with grok parser (validated in UI with sample) and a 
kafka topic => the topic didn’t exist before, and it is created automatically 
as I can see with the kafka-topics.sh CLI utility

-  A simple Nifi flow to push data in this topic => I can see some data 
in the topic with the kafka-console-consumer.sh CLI utility.

But I have the feeling that my topology never consume Kafka messages. The Storm 
UI shows “0” figure nearly everywhere in my topology, and the Elastic index is 
not created (_cat/indices). I see also nothing in the “indexing” Kafka topic.

But I see no error message, I don’t really know how to go on…

Does anybody have a suggestion for me? I guess I’m not the first one with kind 
of issue but I cannot find any case close to mine.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Monday, April 08, 2019 15:17
To: user@metron.apache.org
Subject: Re: Metron concept

All events are indexed by default.

See if this guide helps you any.  
https://cwiki.apache.org/confluence/display/METRON/Adding+a+New+Telemetry+Data+Source

On Mon, Apr 8, 2019 at 2:49 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

There is one my point that isn’t clear for me. When sending data into Metron, 
are all the events all indexed sent to Elastic and / or HDFS, or only the 
events that trigger a triage rule?

For now I’m trying to send some FW logs in Metron, I feed a Kafka topic with 
Nifi, I can see that the topic has data thanks to Kafka CLI, but nothing more 
happens after I’ve configured a new source from UI management…

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, de

RE: Snort logs flow issue

2019-04-09 Thread stephane.davy
Hello Hema,

Unless I’m wrong, this must be setup in MySQL, the database you use for Metron 
REST.


From: Hema malini [mailto:nhemamalin...@gmail.com]
Sent: Tuesday, April 09, 2019 09:42
To: user@metron.apache.org
Subject: Re: Snort logs flow issue

Hi Michael,

Sorry just noticed the error in metron rest logs - Table 'user settings' was 
not found. Do we have to create that hbase table . Where to find the hbase 
tables created. I could see only two namespace in hbase - default and hbase. No 
tables created in that. Do I have to run metron rest in dev profile.

Thanks & Regards
Hema

On Tue, Apr 9, 2019, 12:44 PM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
Hi Michael,

Thanks for your reply. I couldn't find any errors in metron alerts UI log . I 
clicked the search and changed the date range too. Still no records. Do we have 
to run metron rest in dev profile?

On Mon, Apr 8, 2019, 7:50 PM Michael Miklavcic 
mailto:michael.miklav...@gmail.com>> wrote:
If you see them in the dashboard you should be able to see them in the alerts 
UI. Any errors in either the alerts UI or REST logs? Also, the new default 
behavior is that the UI doesn't initiate a search at login, it's up to the user 
to click search.

On Mon, Apr 8, 2019, 6:38 AM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
After recreating the index, now we are able to visualize the data in kibana 
metron dashboard. How we can pass alerts to metron alerts UI. Currently there 
is no data in alerts UI. How.to configure the logs as alerts

On Sat, Apr 6, 2019, 9:21 PM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
Sorry for the typo. Can you please help with the required configuration.

On Sat, Apr 6, 2019, 5:39 PM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
Are we missing any configuration? Initially elastic search was down. We figured 
out the issue and fixed it .Now elastic search is up . We restarted metron 
indexing but still those indices not created. So we created it manually.Do we 
have to change any parser configuration . How logs will flow into metron alerts 
dashboard and kibana dashboard..what is the required congratulation

On Fri, Apr 5, 2019, 11:52 PM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
Sample messages flown in indexing topic
{"msg":"'snort test 
alert'","parallelenricher.splitter.end.ts":"1554384505264","sig_rev":"0","ip_dst_port":"50183","ethsrc":"08:00:27:E8:B0:7A","threat.triage.rules.0.comment":null,"tcpseq":"0x8DF34F4B","threat.triage.score":10.0,"dgmlen":"52","adapter.hostfromjsonlistadapter.end.ts":"1554384503452","adapter.geoadapter.begin.ts":"1554384503452","tcpwindow":"0x1F5","parallelenricher.splitter.begin.ts":"1554384505264","threat.triage.rules.0.score":"10","tcpack":"0x836687BD","protocol":"TCP","ip_dst_addr":"192.168.66.1","original_string":"01\/11\/17-20:53:16.104984
 ,1,999158,0,\"'snort test 
alert'\",TCP,192.168.66.121,8080,192.168.66.1,50183,08:00:27:E8:B0:7A,0A:00:27:00:00:00,0x42,***A,0x8DF34F4B,0x836687BD,,0x1F5,64,0,62040,52,53248","parallelenricher.enrich.end.ts":"1554384505342","threat.triage.rules.0.reason":null,"tos":"0","adapter.hostfromjsonlistadapter.begin.ts":"1554384503452","id":"62040","ip_src_addr":"192.168.66.121","timestamp":1484148196104,"ethdst":"0A:00:27:00:00:00","threat.triage.rules.0.name":null,"is_alert":"true","parallelenricher.enrich.begin.ts":"1554384505264","ttl":"64","source.type":"snort","adapter.geoadapter.end.ts":"1554384503453","ethlen":"0x42","iplen":"53248","adapter.threatinteladapter.begin.ts":"1554384505264","ip_src_port":"8080","tcpflags":"***A","guid":"2f6f3f3c-7739-47fe-aa04-3c62425fbcbf","sig_id":"999158","sig_generator":"1"}


On Fri, Apr 5, 2019, 11:43 PM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
Yes I am getting messages

On Fri, Apr 5, 2019, 11:17 PM Michael Miklavcic 
mailto:michael.miklav...@gmail.com>> wrote:
Do you get 10 records output to the CLI when you run the following?

/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper 
$ZOOKEEPER --topic indexing --from-beginning --max-messages 10


On Fri, Apr 5, 2019 at 11:38 AM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:
We verified it in Storm ui and in Storm topology logs

On Fri, Apr 5, 2019, 10:53 PM Michael Miklavcic 
mailto:michael.miklav...@gmail.com>> wrote:
How did you validate the logs are making it to the indexing topology?

On Fri, Apr 5, 2019 at 8:12 AM Hema malini 
mailto:nhemamalin...@gmail.com>> wrote:

Hi,

We have installed Metron 0.7.1 in centos 7 using Amabari.Using Nifi we sent the 
sample snort logs copied from metron git repo to snort kafka topic.We did the 
same for bro topic.Logs are getting parsed and reached indexing topology . 
Elastic search indices are not getting created though we gave elastic search 
template install from ambari. So manually created the elastic search index 
using template available in metron repo. Though elastic search index is present 
, data f

RE: Metron concept

2019-04-09 Thread stephane.davy
Hello,

I haven’t sorted out yet this issue, but I think I’ve narrowed it. Actually, 
after many tests with Kafka console-consumer and basic Python scripts, I 
realize that I can only consume messages when I specify the partition number 
and not the group.id. This is of course not what storm tries to do, it tries do 
dynamically fetch the right partition and commit the offset.

My Kafka cluster is a fresh one installed with Hortonworks data platform with 3 
brokers. I can’t find any kind of option around this behavior. Moreover, we 
regularly use Kafka for some other purpose with our docker images and have 
never faced issues like this…

Any idea?

Thanks,

Stéphane


From: DAVY Stephane OBS/CSO
Sent: Monday, April 08, 2019 17:56
To: user@metron.apache.org
Subject: RE: Metron concept

Well, I realize that the console-consumer works with the—zookeeper option, 
which is the “old consumer”, while it doesn’t work when I specify 
–bootstrap-server, which is the “new consumer” way. So, it looks like a Kafka 
issue…


From: DAVY Stephane OBS/CSO
Sent: Monday, April 08, 2019 16:45
To: 'user@metron.apache.org'
Subject: RE: Metron concept

Hello Simon,

I send just one line at a time, and the line has been validated in the Metron 
UI. I see no message in the topology logs. I switched to DEBUG mode, and I can 
see the following sequence again and again:

2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Sending coordinator request for 
group forti1_parser to broker r-petya:6667 (id: 1011 rack: /default-rack)
2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Received group coordinator response 
ClientResponse(receivedTimeMs=1554734150463, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@35437dce,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=61518,client_id=consumer-1},
 body={group_id=forti1_parser}), createdTimeMs=1554734150463, 
sendTimeMs=1554734150463), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2019-04-08 16:35:50.562 o.a.k.c.NetworkClient Thread-14-kafkaSpout-executor[4 
4] [DEBUG] Sending metadata request {topics=[forti1]} to node 1011
2019-04-08 16:35:50.562 o.a.k.c.Metadata Thread-14-kafkaSpout-executor[4 4] 
[DEBUG] Updated cluster metadata version 30761 to Cluster(nodes = [r-petya:6667 
(id: 1011 rack: /default-rack), r-jigsaw:6667 (id: 1012 rack: /default-rack), 
r-wannacry.rd.francetelecom.fr:6667 (id: 1010 rack: /default-rack)], partitions 
= [Partition(topic = forti1, partition = 0, leader = 1012, replicas = [1012,], 
isr = [1012,]])


Is it normal to have 
“responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})” in the 
response?

Thanks,

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Monday, April 08, 2019 16:29
To: user@metron.apache.org
Subject: Re: Metron concept

Are you seeing events on the enrichments topic, and if so, are they getting to 
indexing? Any messages in the storm logs for these topologies?

Are you also certain the parser is correct, and there are no invalid or error 
messages being sent to the error index?

Simon

On Mon, 8 Apr 2019 at 15:26, 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. I went through this post and see that all my events 
should go in Elastic, which is what I want, but which it isn’t what I get ☹

I have setup the following basic setup:

-  New telemetry with grok parser (validated in UI with sample) and a 
kafka topic => the topic didn’t exist before, and it is created automatically 
as I can see with the kafka-topics.sh CLI utility

-  A simple Nifi flow to push data in this topic => I can see some data 
in the topic with the kafka-console-consumer.sh CLI utility.

But I have the feeling that my topology never consume Kafka messages. The Storm 
UI shows “0” figure nearly everywhere in my topology, and the Elastic index is 
not created (_cat/indices). I see also nothing in the “indexing” Kafka topic.

But I see no error message, I don’t really know how to go on…

Does anybody have a suggestion for me? I guess I’m not the first one with kind 
of issue but I cannot find any case close to mine.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Monday, April 08, 2019 15:17
To: user@metron.apache.org
Subject: Re: Metron concept

All events are indexed by default.

See if this guide helps you any.  
https://cwiki.apache.org/confluence/display/METRON/Adding+a+New+Telemetry+Data+Source

On Mon, Apr 8, 2019 at 2:49 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

There is one my point that isn’t clear for me. When sending data into Metron, 
are all the events all indexed sent to Elastic and / or HDFS, or only the 
events that trigger a tria

Question about "parser_invalid"

2019-04-09 Thread stephane.davy
Hello everybody,

Don't worry, I won't ask you to debug my Grok statement :)

By the way, I'm facing the following situation: I have in my "error_index" 
Elastic index some documents with a raw_message field that shows that the 
origin message was parsed (see screenshot) and contains in addition an 
"original_string" which is the raw message:
[cid:image001.png@01D4EF78.38D56490]

What is wrong here? Why does it go to error_index?

Thanks,

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Question about "parser_invalid"

2019-04-10 Thread stephane.davy
Thanks Simon,

This solves my issue ☺



From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Wednesday, April 10, 2019 09:39
To: user@metron.apache.org
Subject: Re: Question about "parser_invalid"

Timestamp in Metron is always a unix epoch to avoid things like timezone issues.

In this case, you can resolve this using a field transformation at the parsing 
stage, with the TO_EPOCH_TIMESTAMP function. Some custom parsers already do 
this, but for those that don’t, a simple bit of stellar will clean it up.

Simon

On 10 Apr 2019, at 07:34, 
mailto:stephane.d...@orange.com>> 
mailto:stephane.d...@orange.com>> wrote:
Hello everybody,

Don’t worry, I won’t ask you to debug my Grok statement ☺

By the way, I’m facing the following situation: I have in my “error_index” 
Elastic index some documents with a raw_message field that shows that the 
origin message was parsed (see screenshot) and contains in addition an 
“original_string” which is the raw message:


What is wrong here? Why does it go to error_index?

Thanks,

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Setup Threat Intel feed for Soltra and Hail A Taxi

2019-04-23 Thread stephane.davy
Hello Anil,

I'm not very familiar with all of this, but if you check the following URL: 
https://metron.apache.org/current-book/metron-platform/metron-data-management/index.html,
 there is a section called "Loading utilities" which describes the use of Taxii 
for loading threatintel data

Stéphane


From: Anil Donthireddy [mailto:anil.donthire...@sstech.us]
Sent: Tuesday, April 23, 2019 01:53
To: user@metron.apache.org; d...@metron.apache.org
Cc: Satish Abburi; Christopher Berry
Subject: Setup Threat Intel feed for Soltra and Hail A Taxi

Hi,

I would like to setup Threat Intel feed for Soltra and Hail A Taxi which auto 
polls for every 5 mins. I request someone to point to me the documentation for 
the same.

Thanks,
Anil.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Metron concept

2019-04-23 Thread stephane.davy
Hello Stefan,

Thanks for your email. Actually, I don’t know exactly what was wrong with 
Kafka, but I removed it, cleanup Zookeeper, reinstall Kafka again and it worked.

Stéphane


From: Stefan Kupstaitis-Dunkler [mailto:stefan@gmail.com]
Sent: Wednesday, April 24, 2019 07:06
To: user@metron.apache.org
Subject: Re: Metron concept

Hi Stephane,

seeing this only now, so it might be a little late. Have you resolved it?

If not: How many Kafka nodes are you using? I had a similar issue using only 
one broker, while the default config expects more brokers. The way you describe 
this issue it might be that your configuration expects 3 or more nodes 
(resulting in offset issues). If this matches your problem could you try 
setting "offsets.topic.replication.factor" to your actual number of brokers? 
(if < 3).

Are your Kafka brokers running without errors? (before and after changing this 
config)

Best,
Stefan


On Tue, 9 Apr 2019, 13:44 Simon Elliston Ball, 
mailto:si...@simonellistonball.com>> wrote:
One thing worth noting is that group.id is essentially a 
client identifier, so if you specify one that matches another consumer (such as 
Metron topologies) then they will interfere, and you are likely to balance 
across your console and the actual Metron processes, so generally when watching 
a Kafka topic for debugging you should let Kafka choose a random 
group.id. You should not have to specify partition either if 
you want the console to show all messages.

Simon

On 9 Apr 2019, at 11:17, 
mailto:stephane.d...@orange.com>> 
mailto:stephane.d...@orange.com>> wrote:
Hello,

I haven’t sorted out yet this issue, but I think I’ve narrowed it. Actually, 
after many tests with Kafka console-consumer and basic Python scripts, I 
realize that I can only consume messages when I specify the partition number 
and not the group.id. This is of course not what storm tries 
to do, it tries do dynamically fetch the right partition and commit the offset.

My Kafka cluster is a fresh one installed with Hortonworks data platform with 3 
brokers. I can’t find any kind of option around this behavior. Moreover, we 
regularly use Kafka for some other purpose with our docker images and have 
never faced issues like this…

Any idea?

Thanks,

Stéphane


From: DAVY Stephane OBS/CSO
Sent: Monday, April 08, 2019 17:56
To: user@metron.apache.org
Subject: RE: Metron concept

Well, I realize that the console-consumer works with the—zookeeper option, 
which is the “old consumer”, while it doesn’t work when I specify 
–bootstrap-server, which is the “new consumer” way. So, it looks like a Kafka 
issue…


From: DAVY Stephane OBS/CSO
Sent: Monday, April 08, 2019 16:45
To: 'user@metron.apache.org'
Subject: RE: Metron concept

Hello Simon,

I send just one line at a time, and the line has been validated in the Metron 
UI. I see no message in the topology logs. I switched to DEBUG mode, and I can 
see the following sequence again and again:

2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Sending coordinator request for 
group forti1_parser to broker r-petya:6667 (id: 1011 rack: /default-rack)
2019-04-08 16:35:50.463 o.a.k.c.c.i.AbstractCoordinator 
Thread-14-kafkaSpout-executor[4 4] [DEBUG] Received group coordinator response 
ClientResponse(receivedTimeMs=1554734150463, disconnected=false, 
request=ClientRequest(expectResponse=true, 
callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@35437dce,
 
request=RequestSend(header={api_key=10,api_version=0,correlation_id=61518,client_id=consumer-1},
 body={group_id=forti1_parser}), createdTimeMs=1554734150463, 
sendTimeMs=1554734150463), 
responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
2019-04-08 16:35:50.562 o.a.k.c.NetworkClient Thread-14-kafkaSpout-executor[4 
4] [DEBUG] Sending metadata request {topics=[forti1]} to node 1011
2019-04-08 16:35:50.562 o.a.k.c.Metadata Thread-14-kafkaSpout-executor[4 4] 
[DEBUG] Updated cluster metadata version 30761 to Cluster(nodes = [r-petya:6667 
(id: 1011 rack: /default-rack), r-jigsaw:6667 (id: 1012 rack: /default-rack), 
r-wannacry.rd.francetelecom.fr:6667 
(id: 1010 rack: /default-rack)], partitions = [Partition(topic = forti1, 
partition = 0, leader = 1012, replicas = [1012,], isr = [1012,]])


Is it normal to have 
“responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})” in the 
response?

Thanks,

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Monday, April 08, 2019 16:29
To: user@metron.apache.org
Subject: Re: Metron concept

Are you seeing events on the enrichments topic, and if so, are they getting to 
indexing? Any messages in the storm logs for these topologies?

Are you also certain the pars

Various questions around profiler

2019-04-24 Thread stephane.davy
Hello everybody,

I've been playing with Metron for a few weeks now, it is really a very exciting 
project and I'd like first to thanks all the contributors. I'm currently 
investigating around the use of profiler. I've tested it with the basic example 
of counting IP address as explained in the doc, and now it's time for questions:

-  In the various examples I've found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



>From the Stellar CLI, it seems to works fine, but when I try during data 
>ingest, I see no data coming into the profiler table. Please note that I've 
>waited for the 15mn, and that I have deleted data in the profiler table using 
>the "truncate_preserve"  command in hbase.

-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?

-  The data in the profiler table can be pulled with Stellar functions 
which include some advanced features like statistics, cardinality, but is it 
possible to access all of this from a Java / Scala / any other language?

-  The MaaS service seems to apply to the incoming data only, how is it 
possible to use it only on the aggregated profile data?


Maybe too much questions in the same mail?

Thanks,

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Various questions around profiler

2019-04-24 Thread stephane.davy
Hello Anil,

Thanks for your feedback. By the way, I need to dig a little bit more on MAAS 
usage to understand what can be done.

Have a nice day

Stéphane


From: Anil Donthireddy [mailto:anil.donthire...@sstech.us]
Sent: Wednesday, April 24, 2019 19:25
To: DAVY Stephane OBS/CSO
Cc: user@metron.apache.org
Subject: RE: Various questions around profiler

Hi Stephane.

Please find my comments below in yello.

From: stephane.d...@orange.com [mailto:stephane.d...@orange.com]
Sent: Wednesday, April 24, 2019 1:30 AM
To: user@metron.apache.org
Subject: Various questions around profiler

Hello everybody,

I've been playing with Metron for a few weeks now, it is really a very exciting 
project and I'd like first to thanks all the contributors. I'm currently 
investigating around the use of profiler. I've tested it with the basic example 
of counting IP address as explained in the doc, and now it's time for questions:

-  In the various examples I've found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



>From the Stellar CLI, it seems to works fine, but when I try during data 
>ingest, I see no data coming into the profiler table. Please note that I've 
>waited for the 15mn, and that I have deleted data in the profiler table using 
>the "truncate_preserve"  command in hbase.

You need to have profile field in relust which goes to hbase 
table as below

"result":{

"profile":""

}
However I am not quite sure if you can send multiple values 
like count, sum_rcvd_bytes to profile table.



-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?
I am not sure if there will be any issue with truncate preserve. Upto my 
understanding it should cause any issue but I never tried

-  The data in the profiler table can be pulled with Stellar functions 
which include some advanced features like statistics, cardinality, but is it 
possible to access all of this from a Java / Scala / any other language?
Metron have profiler client implemented in Java, We should able to import the 
metron profiler client jars from the supported languages like Java/Scala to 
query the profiler statistics from code. We are doing it in Java.

-  The MaaS service seems to apply to the incoming data only, how is it 
possible to use it only on the aggregated profile data?
As far I have used MAAS, I used it in enrichment, if you need to do something 
related to each event, then you can query profiler data in enrichment, store it 
to new column and apply MAAS in enrichment. If you want to apply MAAS at entity 
level, then first we need to check if MAAS function is available in class path 
of profiler execution. If it exists then we can call MAAS at possible places 
like init, update part of profiler definition or can even have nested profiler 
definition based on the use case.


Maybe too much questions in the same mail?

Thanks,

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This mes

RE: Various questions around profiler

2019-04-24 Thread stephane.davy
Hello Nick,

Thanks for your answer. Well, I don’t know what was the issue, but after a 
restart profiling data  went to HBASE. With the configuration below ("result":  
"{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"), I get something like 
this in HBASE:

value=\x01\x00java.util.HashMa\xF0\x01\x02\x03\x01coun\xF4\x02\x8A\x03\x03\x0 
\x00\x00\x00\x00\x00\x1Ab\x18   1sum_rcvd_byte\xF3\x09\xDC\x9D\xC3\x09

So, not sure that it is usable. Do not hesitate to share your upcoming work on 
all of this ☺

Cordialement / Best regards

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Wednesday, April 24, 2019 15:08
To: user@metron.apache.org
Subject: Re: Various questions around profiler

>  In the various examples I’ve found, each profile computes only one value. Is 
> it possible to do something like that:

Currently, each profile produces only a single value.  But I've had the same 
thought as you.  It would be nice to be able to persist multiple values in one 
shot.  Feel free to create a JIRA if you have thoughts on how that might work.


> From the Stellar CLI, it seems to works fine, but when I try during data 
> ingest, I see no data coming into the profiler table. Please note that I’ve 
> waited for the 15mn, and that I have deleted data in the profiler table using 
> the “truncate_preserve”  command in hbase.

(1) Have you changed any of the default Profiler settings like the profile 
period?

(2) In the Stellar REPL (CLI) did you test with the real telemetry from Kafka 
or just mock messages?  Retrieve some of the real streaming telemetry using 
`KAFKA_GET` and test that against your profile definition.  Maybe there is a 
difference between the mock data that you tested with versus the real streaming 
data.

(3) To debug further, in the Storm UI go to the Profiler topology, then turn on 
DEBUG logging for the package "org.apache.metron.profiler".  Leave this extra 
logging running for at least the length of your profile period (by default 15 
minutes) and see what the logs show.


>  The data in the profiler table can be pulled with Stellar functions which 
> include some advanced features like statistics, cardinality, but is it 
> possible to access all of this from a Java / Scala / any other language?

There is an internal class that the PROFILE_GET Stellar function uses to 
actually retrieve the profiles from HBase.  You could possibly use this as a 
Java API.  But this is definitely not ideal.

I have done some proof-of-concept work to store the profile data using Phoenix 
in Hbase, which would allow you to access all of the profile data using 
SQL/JDBC/ODBC.  That would allow you to retrieve that data with the tooling of 
your choice, even including something like a Zeppelin Notebook.  When I have 
some cycles, I'd like to share that with the community.











On Wed, Apr 24, 2019 at 4:30 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello everybody,

I’ve been playing with Metron for a few weeks now, it is really a very exciting 
project and I’d like first to thanks all the contributors. I’m currently 
investigating around the use of profiler. I’ve tested it with the basic example 
of counting IP address as explained in the doc, and now it’s time for questions:

-  In the various examples I’ve found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



From the Stellar CLI, it seems to works fine, but when I try during data 
ingest, I see no data coming into the profiler table. Please note that I’ve 
waited for the 15mn, and that I have deleted data in the profiler table using 
the “truncate_preserve”  command in hbase.

-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?

-  The data in the profiler table can be pulled with Stellar functions 
which include some advanced features like statistics, cardinality, but is it 
possible to access all of this from a Java / Scala / any other language?

-  The MaaS service seems to apply to the incoming data only, how is it 
possible to use it only on the aggregated profile data?


Maybe too much questions in the same mail?

Thanks,

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes.

RE: Various questions around profiler

2019-04-25 Thread stephane.davy
Anil,

Do you have any examples you can share about the use of profiler jars in your 
Java code?

Thanks,

Stéphane


From: Anil Donthireddy [mailto:anil.donthire...@sstech.us]
Sent: Wednesday, April 24, 2019 19:25
To: DAVY Stephane OBS/CSO
Cc: user@metron.apache.org
Subject: RE: Various questions around profiler

Hi Stephane.

Please find my comments below in yello.

From: stephane.d...@orange.com [mailto:stephane.d...@orange.com]
Sent: Wednesday, April 24, 2019 1:30 AM
To: user@metron.apache.org
Subject: Various questions around profiler

Hello everybody,

I've been playing with Metron for a few weeks now, it is really a very exciting 
project and I'd like first to thanks all the contributors. I'm currently 
investigating around the use of profiler. I've tested it with the basic example 
of counting IP address as explained in the doc, and now it's time for questions:

-  In the various examples I've found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



>From the Stellar CLI, it seems to works fine, but when I try during data 
>ingest, I see no data coming into the profiler table. Please note that I've 
>waited for the 15mn, and that I have deleted data in the profiler table using 
>the "truncate_preserve"  command in hbase.

You need to have profile field in relust which goes to hbase 
table as below

"result":{

"profile":""

}
However I am not quite sure if you can send multiple values 
like count, sum_rcvd_bytes to profile table.



-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?
I am not sure if there will be any issue with truncate preserve. Upto my 
understanding it should cause any issue but I never tried

-  The data in the profiler table can be pulled with Stellar functions 
which include some advanced features like statistics, cardinality, but is it 
possible to access all of this from a Java / Scala / any other language?
Metron have profiler client implemented in Java, We should able to import the 
metron profiler client jars from the supported languages like Java/Scala to 
query the profiler statistics from code. We are doing it in Java.

-  The MaaS service seems to apply to the incoming data only, how is it 
possible to use it only on the aggregated profile data?
As far I have used MAAS, I used it in enrichment, if you need to do something 
related to each event, then you can query profiler data in enrichment, store it 
to new column and apply MAAS in enrichment. If you want to apply MAAS at entity 
level, then first we need to check if MAAS function is available in class path 
of profiler execution. If it exists then we can call MAAS at possible places 
like init, update part of profiler definition or can even have nested profiler 
definition based on the use case.


Maybe too much questions in the same mail?

Thanks,

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain conf

About Elastic templates

2019-04-25 Thread stephane.davy
Hello all,

As we heavily use Elasticsearch in our company, with some support from Elastic 
company, I'd like to share with about index and template. Here is the starting 
template I use:

{
  "": {
"template": "_index_*",
"settings": {
  "index": {
"number_of_shards": "1",
"number_of_replicas": "1"
  }
},
"mappings": {
  "_default_": {
"dynamic_templates": [
  {
"strings_as_keywords": {
  "match_mapping_type": "string",
  "mapping": {
"type": "keyword"
  }
}
  }
],
"properties": {
  "timestamp": {
"type": "date"
  },
  "@version": {
"type": "text"
  },
  "ip_dst_addr": {
"type": "ip"
  },
 "ip_src_addr": {
"type": "ip"
  },
  "metron_alert": {
"type": "nested"
  }
}
  }
},
"aliases": {}
  }
}

Of course, replace the red parts with your config.

What we do here is to start with shard = 1, instead of shard = 5, which is the 
default, and a really bad default according to Elastic support. If you have 
small indices and a lot of shards, you will kill your Elastic performances.
Regarding fields type:

-  We set the default to "keyword" as "string" is now deprecated

-  We specialize all the non-string fields with their real type. The 
"ip" is really useful, it allows CIDR usage in queries

-  The metron_alert one is needed, otherwise the different GUI exhibits 
some errors.


I hope this helps

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: About Elastic templates

2019-04-25 Thread stephane.davy
I realize that I’ve missed a part of the story regarding shards. The good size 
for shards is around 40~50GB. So, if your index grows up to 200 or 300GB, you 
of course need to increase the number of shards to come back around this size.

This is also why I’d suggest to have .MM.dd in the “Elasticsearch Date 
Format” configuration to not create some hourly indices that will be very small.

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, April 25, 2019 14:25
To: DAVY Stephane OBS/CSO
Subject: Re: About Elastic templates

Thanks for sharing Stéphane!



On Thu, Apr 25, 2019 at 5:58 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello all,

As we heavily use Elasticsearch in our company, with some support from Elastic 
company, I’d like to share with about index and template. Here is the starting 
template I use:

{
  "": {
"template": "_index_*",
"settings": {
  "index": {
"number_of_shards": "1",
"number_of_replicas": "1"
  }
},
"mappings": {
  "_default_": {
"dynamic_templates": [
  {
"strings_as_keywords": {
  "match_mapping_type": "string",
  "mapping": {
"type": "keyword"
  }
}
  }
],
"properties": {
  "timestamp": {
"type": "date"
  },
  "@version": {
"type": "text"
  },
  "ip_dst_addr": {
"type": "ip"
  },
 "ip_src_addr": {
"type": "ip"
  },
  "metron_alert": {
"type": "nested"
  }
}
  }
},
"aliases": {}
  }
}

Of course, replace the red parts with your config.

What we do here is to start with shard = 1, instead of shard = 5, which is the 
default, and a really bad default according to Elastic support. If you have 
small indices and a lot of shards, you will kill your Elastic performances.
Regarding fields type:

-  We set the default to “keyword” as “string” is now deprecated

-  We specialize all the non-string fields with their real type. The 
“ip” is really useful, it allows CIDR usage in queries

-  The metron_alert one is needed, otherwise the different GUI exhibits 
some errors.


I hope this helps

Stéphane

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Various questions around profiler

2019-04-25 Thread stephane.davy
Well, it seems that I have another issue right now:

[Stellar]>>> PROFILE_GET('simple_count','22.0.35.5', PROFILE_FIXED(30, 
'MINUTES'))
[!] Unable to parse: PROFILE_GET('simple_count','22.0.35.5', PROFILE_FIXED(30, 
'MINUTES')) due to: Unable to access table: profiler

It looks like a permission issue but I haven’t setup any security right now.

Any idea?


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, April 25, 2019 14:29
To: user@metron.apache.org
Subject: Re: Various questions around profiler


Try querying for that record in the REPL with PROFILE_GET and then using 
MAP_GET on it.  It will likely work as you expect.

value := PROFILE_GET("test", ...)
MAP_GET("sum_rcvd_bytes", value)



On Thu, Apr 25, 2019 at 2:38 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. Well, I don’t know what was the issue, but after a 
restart profiling data  went to HBASE. With the configuration below ("result":  
"{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"), I get something like 
this in HBASE:

value=\x01\x00java.util.HashMa\xF0\x01\x02\x03\x01coun\xF4\x02\x8A\x03\x03\x0 
\x00\x00\x00\x00\x00\x1Ab\x18   1sum_rcvd_byte\xF3\x09\xDC\x9D\xC3\x09

So, not sure that it is usable. Do not hesitate to share your upcoming work on 
all of this ☺

Cordialement / Best regards

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Wednesday, April 24, 2019 15:08
To: user@metron.apache.org
Subject: Re: Various questions around profiler

>  In the various examples I’ve found, each profile computes only one value. Is 
> it possible to do something like that:

Currently, each profile produces only a single value.  But I've had the same 
thought as you.  It would be nice to be able to persist multiple values in one 
shot.  Feel free to create a JIRA if you have thoughts on how that might work.


> From the Stellar CLI, it seems to works fine, but when I try during data 
> ingest, I see no data coming into the profiler table. Please note that I’ve 
> waited for the 15mn, and that I have deleted data in the profiler table using 
> the “truncate_preserve”  command in hbase.

(1) Have you changed any of the default Profiler settings like the profile 
period?

(2) In the Stellar REPL (CLI) did you test with the real telemetry from Kafka 
or just mock messages?  Retrieve some of the real streaming telemetry using 
`KAFKA_GET` and test that against your profile definition.  Maybe there is a 
difference between the mock data that you tested with versus the real streaming 
data.

(3) To debug further, in the Storm UI go to the Profiler topology, then turn on 
DEBUG logging for the package "org.apache.metron.profiler".  Leave this extra 
logging running for at least the length of your profile period (by default 15 
minutes) and see what the logs show.


>  The data in the profiler table can be pulled with Stellar functions which 
> include some advanced features like statistics, cardinality, but is it 
> possible to access all of this from a Java / Scala / any other language?

There is an internal class that the PROFILE_GET Stellar function uses to 
actually retrieve the profiles from HBase.  You could possibly use this as a 
Java API.  But this is definitely not ideal.

I have done some proof-of-concept work to store the profile data using Phoenix 
in Hbase, which would allow you to access all of the profile data using 
SQL/JDBC/ODBC.  That would allow you to retrieve that data with the tooling of 
your choice, even including something like a Zeppelin Notebook.  When I have 
some cycles, I'd like to share that with the community.











On Wed, Apr 24, 2019 at 4:30 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello everybody,

I’ve been playing with Metron for a few weeks now, it is really a very exciting 
project and I’d like first to thanks all the contributors. I’m currently 
investigating around the use of profiler. I’ve tested it with the basic example 
of counting IP address as explained in the doc, and now it’s time for questions:

-  In the various examples I’ve found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



From the Stellar CLI, it seems to works fine, but when I try during data 
ingest, I see no data coming into the profiler table. Please note that I’ve 
waited for the 15mn, and that I have deleted data in the profiler table using 
the “truncate_preserve”  command in hbase.

-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?

-  Th

Issue when trying to load JSON

2019-04-25 Thread stephane.davy
Hello,

I'm trying to load some JSON data which has the following structure (this is a 
sample):

{
  "_index": "indexing",
  "_type": "Event",
  "_id": "AWAkTAefYn0uCUpkHmCy",
  "_score": 1,
  "_source": {
"dst": "127.0.0.1",
"devTimeEpoch": "151243734",
"dstPort": "0",
"srcPort": "80",
"src": "194.51.198.185"
  }
}

In my file, everything is on the same line. My parser config is the following:

{
  "parserClassName": "org.apache.metron.parsers.json.JSONMapParser",
  "filterClassName": null,
  "sensorTopic": "my_topic",
  "outputTopic": null,
  "errorTopic": null,
  "writerClassName": null,
  "errorWriterClassName": null,
  "readMetadata": true,
  "mergeMetadata": true,
  "numWorkers": 2,
  "numAckers": null,
  "spoutParallelism": 1,
  "spoutNumTasks": 1,
  "parserParallelism": 2,
  "parserNumTasks": 2,
  "errorWriterParallelism": 1,
  "errorWriterNumTasks": 1,
  "spoutConfig": {},
  "securityProtocol": null,
  "stormConfig": {},
  "parserConfig": {
  },
  "fieldTransformations": [
   {
 "transformation":"RENAME",
 "config": {
"dst": "ip_dst_addr",
"src": "ip_src_addr",
"srcPort": "ip_src_port",
"dstPort": "ip_dst_port",
"devTimeEpoch": "timestamp"
 }
   }
  ],
  "cacheConfig": {},
  "rawMessageStrategy": "ENVELOPE",
  "rawMessageStrategyConfig": {
"messageField": "_source"
  }
}

But in Storm I get the following errors:

2019-04-25 16:45:22.225 o.a.s.d.executor Thread-5-parserBolt-executor[8 8] 
[ERROR]
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to 
java.lang.String
at 
org.apache.metron.common.message.metadata.EnvelopedRawMessageStrategy.get(EnvelopedRawMessageStrategy.java:78)
 ~[stormjar.jar:?]
at 
org.apache.metron.common.message.metadata.RawMessageStrategies.get(RawMessageStrategies.java:54)
 ~[stormjar.jar:?]
at 
org.apache.metron.common.message.metadata.RawMessageUtil.getRawMessage(RawMessageUtil.java:55)
 ~[stormjar.jar:?]
at 
org.apache.metron.parsers.bolt.ParserBolt.execute(ParserBolt.java:251) 
[stormjar.jar:?]
at 
org.apache.storm.daemon.executor$fn__10195$tuple_action_fn__10197.invoke(executor.clj:735)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.daemon.executor$mk_task_receiver$fn__10114.invoke(executor.clj:466)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.disruptor$clojure_handler$reify__4137.onEvent(disruptor.clj:40)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at 
org.apache.storm.daemon.executor$fn__10195$fn__10208$fn__10263.invoke(executor.clj:855)
 [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at org.apache.storm.util$async_loop$fn__1221.invoke(util.clj:484) 
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]


How can I debug this?

Thanks

Stéphane

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Issue when trying to load JSON

2019-04-25 Thread stephane.davy
Hello,

Actually, I want to keep only the _source part. The full story is that these 
data are a dump from another Elasticsearch cluster. After reading this: 
https://metron.apache.org/current-book/metron-platform/metron-parsers/ParserChaining.html,
 I thought I could do the same with JSON. In this example, the BLOB is a CSV, 
and the parser config is the following:

{
  "parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
  ,"sensorTopic" : "my_topic"
  ,"rawMessageStrategy" : "ENVELOPE"
  ,"rawMessageStrategyConfig" : {
  "messageField" : "payload",
  "metadataPrefix" : ""
  }
  , "parserConfig": {
 "columns" : { "f1": 0,
 , "f2": 1,
 , "f3": 2
 }
   }
}

My understanding is that using “ENVELOPE”, the parser expects to have some high 
level JSON, and a CSV in payload, this is why I wanted to do the same with 
JSON. But as far as I understand, it doesn’t seem to work, does it?

Stéphane

From: Otto Fowler [mailto:ottobackwa...@gmail.com]
Sent: Thursday, April 25, 2019 17:34
To: user@metron.apache.org
Subject: Re: Issue when trying to load JSON

Also, our support for nested, unflattened json isn’t great to begin with.

Stephane,  can you state your use case?

Do you want to get _source only to transform it?  or do you want to use source 
as the message and discard the top level fields?  other?




On April 25, 2019 at 11:31:36, Otto Fowler 
(ottobackwa...@gmail.com) wrote:
I’m not sure about the name, I’m more thinking about the case.
I’m not sure this is an enveloped issue, or a new feature for the json map 
parser ( or if you could do it with the jsonMap parser and JSONPath )




On April 25, 2019 at 11:23:25, Simon Elliston Ball 
(si...@simonellistonball.com) wrote:
Seems like this would a good additional strategy, something like 
ENVELOPE_PARSED? Any thoughts on a good name?

On Thu, 25 Apr 2019 at 16:20, Otto Fowler 
mailto:ottobackwa...@gmail.com>> wrote:
So,  the enveloped message doesn’t support getting an already parsed json 
object from the enveloped json, we would have to do some work to support this,  
Even if we _could_ wrangle it in there now, from what I can see we would still  
have to serialize to bytes to pass to the actual parser and that would be 
inefficient.
Can you open a jira with the information you provided?




On April 25, 2019 at 11:12:38, Otto Fowler 
(ottobackwa...@gmail.com) wrote:
Raw message in this case assumes that the raw message is a String embedded in 
the json field that you supply, not a nested json object, so it is looking for


“_source” : “some other embedded string of some format like syslog in json”

There are other message strategies, but I’m not sure they would work in this 
instance.  I’ll keep looking. hopefully someone more familiar will jump in.



On April 25, 2019 at 10:48:06, 
stephane.d...@orange.com 
(stephane.d...@orange.com) wrote:
Hello,

I’m trying to load some JSON data which has the following structure (this is a 
sample):

{
  "_index": "indexing",
  "_type": "Event",
  "_id": "AWAkTAefYn0uCUpkHmCy",
  "_score": 1,
  "_source": {
"dst": "127.0.0.1",
"devTimeEpoch": "151243734",
"dstPort": "0",
"srcPort": "80",
"src": "194.51.198.185"
  }
}

In my file, everything is on the same line. My parser config is the following:

{
  "parserClassName": "org.apache.metron.parsers.json.JSONMapParser",
  "filterClassName": null,
  "sensorTopic": "my_topic",
  "outputTopic": null,
  "errorTopic": null,
  "writerClassName": null,
  "errorWriterClassName": null,
  "readMetadata": true,
  "mergeMetadata": true,
  "numWorkers": 2,
  "numAckers": null,
  "spoutParallelism": 1,
  "spoutNumTasks": 1,
  "parserParallelism": 2,
  "parserNumTasks": 2,
  "errorWriterParallelism": 1,
  "errorWriterNumTasks": 1,
  "spoutConfig": {},
  "securityProtocol": null,
  "stormConfig": {},
  "parserConfig": {
  },
  "fieldTransformations": [
   {
 "transformation":"RENAME",
 "config": {
"dst": "ip_dst_addr",
"src": "ip_src_addr",
"srcPort": "ip_src_port",
"dstPort": "ip_dst_port",
"devTimeEpoch": "timestamp"
 }
   }
  ],
  "cacheConfig": {},
  "rawMessageStrategy": "ENVELOPE",
  "rawMessageStrategyConfig": {
"messageField": "_source"
  }
}

But in Storm I get the following errors:

2019-04-25 16:45:22.225 o.a.s.d.executor Thread-5-parserBolt-executor[8 8] 
[ERROR]
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to 
java.lang.String
at 
org.apache.metron.common.message.metadata.EnvelopedRawMessageStrategy.get(EnvelopedRawMessageStrategy.java:78)
 ~[stormjar.jar:?]
at 
org.apache.metron.common.message.metadata.RawMessageStrategies.get(RawMessageStrategies.java:54)
 ~[stormjar.jar:?]
at 
org.apache.metron.common.message.metadata.

RE: Various questions around profiler

2019-04-25 Thread stephane.davy
OK, I finally found the problem when pasting the whole error stack in the mail:

Caused by: java.lang.RuntimeException: Unexpected version format: 11.0.3

The first java in my path is java 11. When switching to Java 8 it worked 
correctly

Stéphane


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, April 25, 2019 17:35
To: user@metron.apache.org
Subject: Re: Various questions around profiler

(1) Did you launch the REPL with the -z  option?

(2) What user are you running the REPL as?

(3) Can you scan the table using the 'hbase shell'? Something like this...
echo "scan 'profiler'" | hbase shell

(4) Can you show the full session from launching the REPL, running the 
PROFILE_GET, and including the full stack trace and error?




On Thu, Apr 25, 2019 at 9:44 AM 
mailto:stephane.d...@orange.com>> wrote:
Well, it seems that I have another issue right now:

[Stellar]>>> PROFILE_GET('simple_count','22.0.35.5', PROFILE_FIXED(30, 
'MINUTES'))
[!] Unable to parse: PROFILE_GET('simple_count','22.0.35.5', PROFILE_FIXED(30, 
'MINUTES')) due to: Unable to access table: profiler

It looks like a permission issue but I haven’t setup any security right now.

Any idea?


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, April 25, 2019 14:29
To: user@metron.apache.org
Subject: Re: Various questions around profiler


Try querying for that record in the REPL with PROFILE_GET and then using 
MAP_GET on it.  It will likely work as you expect.

value := PROFILE_GET("test", ...)
MAP_GET("sum_rcvd_bytes", value)



On Thu, Apr 25, 2019 at 2:38 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. Well, I don’t know what was the issue, but after a 
restart profiling data  went to HBASE. With the configuration below ("result":  
"{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"), I get something like 
this in HBASE:

value=\x01\x00java.util.HashMa\xF0\x01\x02\x03\x01coun\xF4\x02\x8A\x03\x03\x0 
\x00\x00\x00\x00\x00\x1Ab\x18   1sum_rcvd_byte\xF3\x09\xDC\x9D\xC3\x09

So, not sure that it is usable. Do not hesitate to share your upcoming work on 
all of this ☺

Cordialement / Best regards

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Wednesday, April 24, 2019 15:08
To: user@metron.apache.org
Subject: Re: Various questions around profiler

>  In the various examples I’ve found, each profile computes only one value. Is 
> it possible to do something like that:

Currently, each profile produces only a single value.  But I've had the same 
thought as you.  It would be nice to be able to persist multiple values in one 
shot.  Feel free to create a JIRA if you have thoughts on how that might work.


> From the Stellar CLI, it seems to works fine, but when I try during data 
> ingest, I see no data coming into the profiler table. Please note that I’ve 
> waited for the 15mn, and that I have deleted data in the profiler table using 
> the “truncate_preserve”  command in hbase.

(1) Have you changed any of the default Profiler settings like the profile 
period?

(2) In the Stellar REPL (CLI) did you test with the real telemetry from Kafka 
or just mock messages?  Retrieve some of the real streaming telemetry using 
`KAFKA_GET` and test that against your profile definition.  Maybe there is a 
difference between the mock data that you tested with versus the real streaming 
data.

(3) To debug further, in the Storm UI go to the Profiler topology, then turn on 
DEBUG logging for the package "org.apache.metron.profiler".  Leave this extra 
logging running for at least the length of your profile period (by default 15 
minutes) and see what the logs show.


>  The data in the profiler table can be pulled with Stellar functions which 
> include some advanced features like statistics, cardinality, but is it 
> possible to access all of this from a Java / Scala / any other language?

There is an internal class that the PROFILE_GET Stellar function uses to 
actually retrieve the profiles from HBase.  You could possibly use this as a 
Java API.  But this is definitely not ideal.

I have done some proof-of-concept work to store the profile data using Phoenix 
in Hbase, which would allow you to access all of the profile data using 
SQL/JDBC/ODBC.  That would allow you to retrieve that data with the tooling of 
your choice, even including something like a Zeppelin Notebook.  When I have 
some cycles, I'd like to share that with the community.











On Wed, Apr 24, 2019 at 4:30 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello everybody,

I’ve been playing with Metron for a few weeks now, it is really a very exciting 
project and I’d like first to thanks all the contributors. I’m currently 
investigating around the use of profiler. I’ve tested it with the basic example 
of counting IP address as explained in the doc, and now it’s time for questions:

-

RE: Various questions around profiler

2019-04-26 Thread stephane.davy
Hello Nick,

Just to confirm you that it works perfectly ☺

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, April 25, 2019 14:29
To: user@metron.apache.org
Subject: Re: Various questions around profiler


Try querying for that record in the REPL with PROFILE_GET and then using 
MAP_GET on it.  It will likely work as you expect.

value := PROFILE_GET("test", ...)
MAP_GET("sum_rcvd_bytes", value)



On Thu, Apr 25, 2019 at 2:38 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. Well, I don’t know what was the issue, but after a 
restart profiling data  went to HBASE. With the configuration below ("result":  
"{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"), I get something like 
this in HBASE:

value=\x01\x00java.util.HashMa\xF0\x01\x02\x03\x01coun\xF4\x02\x8A\x03\x03\x0 
\x00\x00\x00\x00\x00\x1Ab\x18   1sum_rcvd_byte\xF3\x09\xDC\x9D\xC3\x09

So, not sure that it is usable. Do not hesitate to share your upcoming work on 
all of this ☺

Cordialement / Best regards

Stéphane

From: Nick Allen [mailto:n...@nickallen.org]
Sent: Wednesday, April 24, 2019 15:08
To: user@metron.apache.org
Subject: Re: Various questions around profiler

>  In the various examples I’ve found, each profile computes only one value. Is 
> it possible to do something like that:

Currently, each profile produces only a single value.  But I've had the same 
thought as you.  It would be nice to be able to persist multiple values in one 
shot.  Feel free to create a JIRA if you have thoughts on how that might work.


> From the Stellar CLI, it seems to works fine, but when I try during data 
> ingest, I see no data coming into the profiler table. Please note that I’ve 
> waited for the 15mn, and that I have deleted data in the profiler table using 
> the “truncate_preserve”  command in hbase.

(1) Have you changed any of the default Profiler settings like the profile 
period?

(2) In the Stellar REPL (CLI) did you test with the real telemetry from Kafka 
or just mock messages?  Retrieve some of the real streaming telemetry using 
`KAFKA_GET` and test that against your profile definition.  Maybe there is a 
difference between the mock data that you tested with versus the real streaming 
data.

(3) To debug further, in the Storm UI go to the Profiler topology, then turn on 
DEBUG logging for the package "org.apache.metron.profiler".  Leave this extra 
logging running for at least the length of your profile period (by default 15 
minutes) and see what the logs show.


>  The data in the profiler table can be pulled with Stellar functions which 
> include some advanced features like statistics, cardinality, but is it 
> possible to access all of this from a Java / Scala / any other language?

There is an internal class that the PROFILE_GET Stellar function uses to 
actually retrieve the profiles from HBase.  You could possibly use this as a 
Java API.  But this is definitely not ideal.

I have done some proof-of-concept work to store the profile data using Phoenix 
in Hbase, which would allow you to access all of the profile data using 
SQL/JDBC/ODBC.  That would allow you to retrieve that data with the tooling of 
your choice, even including something like a Zeppelin Notebook.  When I have 
some cycles, I'd like to share that with the community.











On Wed, Apr 24, 2019 at 4:30 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello everybody,

I’ve been playing with Metron for a few weeks now, it is really a very exciting 
project and I’d like first to thanks all the contributors. I’m currently 
investigating around the use of profiler. I’ve tested it with the basic example 
of counting IP address as explained in the doc, and now it’s time for questions:

-  In the various examples I’ve found, each profile computes only one 
value. Is it possible to do something like that:

{

  "profiles": [

{

  "profile": "test",

  "foreach": "ip_src_addr",

  "onlyif": "exists(ip_src_addr)",

  "init":{ "count": 0 , "sum_rcvd_bytes": 0},

  "update":  { "count": "count + 1",

   "sum_rcvd_bytes": "sum_rcvd_bytes + rcvdbyte" },

  "result":  "{'count': count, 'sum_rcvd_bytes': sum_rcvd_bytes}"

}

  ]

}



From the Stellar CLI, it seems to works fine, but when I try during data 
ingest, I see no data coming into the profiler table. Please note that I’ve 
waited for the 15mn, and that I have deleted data in the profiler table using 
the “truncate_preserve”  command in hbase.

-  In case of issue, what is the right procedure to reinitialize all 
the profiler stack?

-  The data in the profiler table can be pulled with Stellar functions 
which include some advanced features like statistics, cardinality, but is it 
possible to access all of this from a Java / Scala / any other language?

-  The MaaS service seems to apply to the incoming data only, ho

Very low throuput on topologies

2019-05-14 Thread stephane.davy
Hello happy metron users,

I've a Metron cluster based on Hortonworks CP, and I've setup Kerberos on the 
top of it, as you all probably have done since we deal with security :)

It seems that everything is working fine, Kerberos, ranger,... but I'm facing 
an issue regarding the overall throuput.

I feed my cluster with Nifi, here is what I do:
Test 1:

-  Send 2 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: 
messages are coming after nearly 20 s
  Test 2:

-  Send 200 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: some 
messages are coming immediately, but it seems they come 10 by 10 (nearly), with 
many seconds between each flow


It's probably related to Storm configuration, but I don't know where to go now. 
I've tried to change various parameters like topology.max.spout.pending 
(currently set to 500), but no improvement

Thanks for your help

Stéphane


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Very low throuput on topologies

2019-05-15 Thread stephane.davy
Hello Nick,

Thanks for your answer. By the way, the problem already happens before 
indexing, at the parser level. It takes many time to go from sensor topic to 
“enrichments” topic, and again many seconds to go from “enrichments” topic to 
“indexing” topic.

I’ve tried the recommendations described here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
The problem with Kerberos is that it is no longer possible to access Storm UI 
without some tweaks that are blocked by administrator on my computer.


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 14, 2019 23:39
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Have you increased the indexing "batch_size"?  That is the first knob to start 
tuning.

https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration



On Tue, May 14, 2019 at 10:26 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello happy metron users,

I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the 
top of it, as you all probably have done since we deal with security ☺

It seems that everything is working fine, Kerberos, ranger,… but I’m facing an 
issue regarding the overall throuput.

I feed my cluster with Nifi, here is what I do:
Test 1:

-  Send 2 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: 
messages are coming after nearly 20 s
  Test 2:

-  Send 200 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: some 
messages are coming immediately, but it seems they come 10 by 10 (nearly), with 
many seconds between each flow


It’s probably related to Storm configuration, but I don’t know where to go now. 
I’ve tried to change various parameters like topology.max.spout.pending 
(currently set to 500), but no improvement

Thanks for your help

Stéphane


_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Very low throuput on topologies

2019-05-16 Thread stephane.davy
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very high doesn’t it?

And for bolt:
{
  "emitted": 0,
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 12,
  "errorHost": "",
  "failed": 0,
  "boltId": "parserBolt",
  "executors": 12,
  "processLatency": "832.962",
  "executeLatency": "1.391",
  "transferred": 0,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 3680,
  "requestedCpu": 10,
  "encodedBoltId": "parserBolt",
  "lastError": "",
  "executed": 3680,
  "capacity": "0.003",
  "errorWorkerLogLink": ""
}

So, my understanding is that it takes a lot of time to ack tuples, but I don’t 
know where to go now.  As said below, I’ve tried the tweaks mentioned here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
It looks like that we are trying to fill a bucket, and send data after a given 
timeout if the bucket is not full. But I don’t see any timeout that looks like 
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to 
work fine except Metron ingestion.

Thanks for your help,

Stéphane

From: Michael Miklavcic [mailto:michael.miklav...@gmail.com]
Sent: Wednesday, May 15, 2019 16:03
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

You could use curl from the cli. But if this is something you're testing out on 
your local machine, I'd probably start without Kerberos enabled and work the 
perf knobs there first. You should be able to see the "complete latency" from 
the Storm UI on each running topology.

On Wed, May 15, 2019 at 1:33 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. By the way, the problem already happens before 
indexing, at the parser level. It takes many time to go from sensor topic to 
“enrichments” topic, and again many seconds to go from “enrichments” topic to 
“indexing” topic.

I’ve tried the recommendations described here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
The problem with Kerberos is that it is no longer possible to access Storm UI 
without some tweaks that are blocked by administrator on my computer.


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 14, 2019 23:39
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Have you increased the indexing "batch_size"?  That is the first knob to start 
tuning.

https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration



On Tue, May 14, 2019 at 10:26 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello happy metron users,

I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the 
top of it, as you all probably have done since we deal with security ☺

It seems that everything is working fine, Kerberos, ranger,… but I’m facing an 
issue regarding the overall throuput.

I feed my cluster with Nifi, here is what I do:
Test 1:

-  Send 2 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: 
messages are coming after nearly 20 s
  Test 2:

-  Send 200 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messages from sensor topic: response 
is immediate

-  Use kafka CLI consumer to read messages from enrichment topic: some 
messages are coming immediately, but it seems they come 10 by 10 (nearly), with 
many seconds between each flow


It’s probably related to Storm configuration, but I don’t know where to go now. 
I’ve tried to change various parameters like topology.max.spout.pending 
(currently set to 500

RE: Very low throuput on topologies

2019-05-16 Thread stephane.davy
Hello Simon,

It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to 
10, no change. Moreover, if I send let’s say 2 lines, it will anyway take a 
lot of time to be fully processed.




From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:20
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

If you’re sending low volumes to test, you may be waiting on the batch timeout, 
ie not triggering flushing a batch by volume, but waiting for the timeout, 
which may explain your latency.

Simon

On Thu, 16 May 2019 at 10:04, 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very high doesn’t it?

And for bolt:
{
  "emitted": 0,
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 12,
  "errorHost": "",
  "failed": 0,
  "boltId": "parserBolt",
  "executors": 12,
  "processLatency": "832.962",
  "executeLatency": "1.391",
  "transferred": 0,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 3680,
  "requestedCpu": 10,
  "encodedBoltId": "parserBolt",
  "lastError": "",
  "executed": 3680,
  "capacity": "0.003",
  "errorWorkerLogLink": ""
}

So, my understanding is that it takes a lot of time to ack tuples, but I don’t 
know where to go now.  As said below, I’ve tried the tweaks mentioned here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
It looks like that we are trying to fill a bucket, and send data after a given 
timeout if the bucket is not full. But I don’t see any timeout that looks like 
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to 
work fine except Metron ingestion.

Thanks for your help,

Stéphane

From: Michael Miklavcic 
[mailto:michael.miklav...@gmail.com]
Sent: Wednesday, May 15, 2019 16:03
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

You could use curl from the cli. But if this is something you're testing out on 
your local machine, I'd probably start without Kerberos enabled and work the 
perf knobs there first. You should be able to see the "complete latency" from 
the Storm UI on each running topology.

On Wed, May 15, 2019 at 1:33 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. By the way, the problem already happens before 
indexing, at the parser level. It takes many time to go from sensor topic to 
“enrichments” topic, and again many seconds to go from “enrichments” topic to 
“indexing” topic.

I’ve tried the recommendations described here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
The problem with Kerberos is that it is no longer possible to access Storm UI 
without some tweaks that are blocked by administrator on my computer.


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 14, 2019 23:39
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Have you increased the indexing "batch_size"?  That is the first knob to start 
tuning.

https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration



On Tue, May 14, 2019 at 10:26 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello happy metron users,

I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the 
top of it, as you all probably have done since we deal with security ☺

It seems that everything is working fine, Kerberos, ranger,… but I’m facing an 
issue regarding the overall throuput.

I feed my cluster with Nifi, here is what I do:
Test 1:

-  Send 2 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consu

RE: Very low throuput on topologies

2019-05-16 Thread stephane.davy
Hello Simon,

If you talk about this: 
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration

My settings are (I’ve tried many changes here):
{
  "hdfs": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "elasticsearch": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "solr": {
"batchSize": 10,
"enabled": false,
"index": "ansi"
  }
}

What is not clear for me is how these settings can influence the 
random_indexing and batch_indexing topologies. Once again, the first delay I 
see is between the source topic and the “enrichments” topic. Unless I’m wrong, 
data are moved between these 2 topics by the parsing topology, without any 
indexing action.

Stéphane


From: Simon Elliston Ball [mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:40
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Can you share your Metron batch size config and timeouts?

On Thu, 16 May 2019 at 11:31, 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to 
10, no change. Moreover, if I send let’s say 2 lines, it will anyway take a 
lot of time to be fully processed.




From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:20
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

If you’re sending low volumes to test, you may be waiting on the batch timeout, 
ie not triggering flushing a batch by volume, but waiting for the timeout, 
which may explain your latency.

Simon

On Thu, 16 May 2019 at 10:04, 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very high doesn’t it?

And for bolt:
{
  "emitted": 0,
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 12,
  "errorHost": "",
  "failed": 0,
  "boltId": "parserBolt",
  "executors": 12,
  "processLatency": "832.962",
  "executeLatency": "1.391",
  "transferred": 0,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 3680,
  "requestedCpu": 10,
  "encodedBoltId": "parserBolt",
  "lastError": "",
  "executed": 3680,
  "capacity": "0.003",
  "errorWorkerLogLink": ""
}

So, my understanding is that it takes a lot of time to ack tuples, but I don’t 
know where to go now.  As said below, I’ve tried the tweaks mentioned here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
It looks like that we are trying to fill a bucket, and send data after a given 
timeout if the bucket is not full. But I don’t see any timeout that looks like 
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to 
work fine except Metron ingestion.

Thanks for your help,

Stéphane

From: Michael Miklavcic 
[mailto:michael.miklav...@gmail.com]
Sent: Wednesday, May 15, 2019 16:03
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

You could use curl from the cli. But if this is something you're testing out on 
your local machine, I'd probably start without Kerberos enabled and work the 
perf knobs there first. You should be able to see the "complete latency" from 
the Storm UI on each running topology.

On Wed, May 15, 2019 at 1:33 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. By the way, the problem already happens before 
indexing, at the parser level. It takes many time to go from sensor topic to 
“enrichments” t

RE: Very low throuput on topologies

2019-05-16 Thread stephane.davy
Hello Nick,

I have 4 good physical servers with 32G of RAM each, SSD drives,… and no 
activity on theses servers. Actually, at the beginning of my tests, I didn’t 
face this kind of issue. It seems to be related to the fact that I’ve enabled 
Kerberos, I’m currently reverting back to no Kerberos.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, May 16, 2019 15:01
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Can you describe the hardware you are running on?  How many nodes? Cloud or 
bare metal? Memory, CPU, etc.

On Thu, May 16, 2019 at 5:04 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very high doesn’t it?

And for bolt:
{
  "emitted": 0,
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 12,
  "errorHost": "",
  "failed": 0,
  "boltId": "parserBolt",
  "executors": 12,
  "processLatency": "832.962",
  "executeLatency": "1.391",
  "transferred": 0,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 3680,
  "requestedCpu": 10,
  "encodedBoltId": "parserBolt",
  "lastError": "",
  "executed": 3680,
  "capacity": "0.003",
  "errorWorkerLogLink": ""
}

So, my understanding is that it takes a lot of time to ack tuples, but I don’t 
know where to go now.  As said below, I’ve tried the tweaks mentioned here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
It looks like that we are trying to fill a bucket, and send data after a given 
timeout if the bucket is not full. But I don’t see any timeout that looks like 
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to 
work fine except Metron ingestion.

Thanks for your help,

Stéphane

From: Michael Miklavcic 
[mailto:michael.miklav...@gmail.com]
Sent: Wednesday, May 15, 2019 16:03
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

You could use curl from the cli. But if this is something you're testing out on 
your local machine, I'd probably start without Kerberos enabled and work the 
perf knobs there first. You should be able to see the "complete latency" from 
the Storm UI on each running topology.

On Wed, May 15, 2019 at 1:33 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

Thanks for your answer. By the way, the problem already happens before 
indexing, at the parser level. It takes many time to go from sensor topic to 
“enrichments” topic, and again many seconds to go from “enrichments” topic to 
“indexing” topic.

I’ve tried the recommendations described here: 
https://github.com/apache/storm/blob/master/docs/Performance.md but no change. 
The problem with Kerberos is that it is no longer possible to access Storm UI 
without some tweaks that are blocked by administrator on my computer.


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 14, 2019 23:39
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Have you increased the indexing "batch_size"?  That is the first knob to start 
tuning.

https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration



On Tue, May 14, 2019 at 10:26 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello happy metron users,

I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the 
top of it, as you all probably have done since we deal with security ☺

It seems that everything is working fine, Kerberos, ranger,… but I’m facing an 
issue regarding the overall throuput.

I feed my cluster with Nifi, here is what I do:
Test 1:

-  Send 2 lines of logs to Kafka sensor topic with Nifi

-  Use Kafka CLI consumer to read messa

RE: Very low throuput on topologies

2019-05-20 Thread stephane.davy
Hello Nick,

You are right, it was related to batchSize and batchTimeout settings, but I was 
confused about the place it was, I was tweaking the Indexing ones. But now, 
I’ve understood a little bit better about these settings and I can see their 
effects.

By the way, I still have one question: in the link you mention below, it is 
said that in case batchTimeout is not set, it will fall down to a fraction of 
topology.message.timeout.secs storm parameter. Do you how we can get this 
fraction?

Thanks for your help ☺


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, May 16, 2019 15:08
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Ok, now I understand a little better.  You are sending low volumes of telemetry 
just for testing.

I think Simon is on the right track.  There is also a batchSize and 
batchTimeout setting for Parsers that you can find at the link below.

https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html






On Thu, May 16, 2019 at 8:12 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

If you talk about this: 
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration

My settings are (I’ve tried many changes here):
{
  "hdfs": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "elasticsearch": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "solr": {
"batchSize": 10,
"enabled": false,
"index": "ansi"
  }
}

What is not clear for me is how these settings can influence the 
random_indexing and batch_indexing topologies. Once again, the first delay I 
see is between the source topic and the “enrichments” topic. Unless I’m wrong, 
data are moved between these 2 topics by the parsing topology, without any 
indexing action.

Stéphane


From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:40
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Can you share your Metron batch size config and timeouts?

On Thu, 16 May 2019 at 11:31, 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to 
10, no change. Moreover, if I send let’s say 2 lines, it will anyway take a 
lot of time to be fully processed.




From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:20
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

If you’re sending low volumes to test, you may be waiting on the batch timeout, 
ie not triggering flushing a batch by volume, but waiting for the timeout, 
which may explain your latency.

Simon

On Thu, 16 May 2019 at 10:04, 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very high doesn’t it?

And for bolt:
{
  "emitted": 0,
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 12,
  "errorHost": "",
  "failed": 0,
  "boltId": "parserBolt",
  "executors": 12,
  "processLatency": "832.962",
  "executeLatency": "1.391",
  "transferred": 0,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 3680,
  "requestedCpu": 10,
  "encodedBoltId": "parserBolt",
  "lastError": "",
  "executed": 3680,
  "capacity": "0.003",
  "errorWorkerLogLink": ""
}

So, my understanding is that it takes a lot of time to ack tuples, but I don’t 
know where to go now.  As said below, I’ve tried the tweaks mentioned here: 
https://github.com/apache/storm/blob/master/docs/

RE: Very low throuput on topologies

2019-05-21 Thread stephane.davy
Thanks Nick.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 21, 2019 14:15
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

> In the link you mention below, it is said that in case batchTimeout is not 
> set, it will fall down to a fraction of topology.message.timeout.secs storm 
> parameter. Do you how we can get this fraction?

By default, the `batchTimeout` is set to 1/2 of the 
`topology.message.timeout.secs`.

https://github.com/apache/metron/blob/51d1c812c1e45f57da8c27fe37fd13797707884e/metron-platform/metron-writer/metron-writer-storm/src/main/java/org/apache/metron/writer/bolt/BatchTimeoutHelper.java#L118-L122




On Mon, May 20, 2019 at 11:03 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

You are right, it was related to batchSize and batchTimeout settings, but I was 
confused about the place it was, I was tweaking the Indexing ones. But now, 
I’ve understood a little bit better about these settings and I can see their 
effects.

By the way, I still have one question: in the link you mention below, it is 
said that in case batchTimeout is not set, it will fall down to a fraction of 
topology.message.timeout.secs storm parameter. Do you how we can get this 
fraction?

Thanks for your help ☺


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, May 16, 2019 15:08
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Ok, now I understand a little better.  You are sending low volumes of telemetry 
just for testing.

I think Simon is on the right track.  There is also a batchSize and 
batchTimeout setting for Parsers that you can find at the link below.

https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html






On Thu, May 16, 2019 at 8:12 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

If you talk about this: 
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration

My settings are (I’ve tried many changes here):
{
  "hdfs": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "elasticsearch": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "solr": {
"batchSize": 10,
"enabled": false,
"index": "ansi"
  }
}

What is not clear for me is how these settings can influence the 
random_indexing and batch_indexing topologies. Once again, the first delay I 
see is between the source topic and the “enrichments” topic. Unless I’m wrong, 
data are moved between these 2 topics by the parsing topology, without any 
indexing action.

Stéphane


From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:40
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Can you share your Metron batch size config and timeouts?

On Thu, 16 May 2019 at 11:31, 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to 
10, no change. Moreover, if I send let’s say 2 lines, it will anyway take a 
lot of time to be fully processed.




From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:20
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

If you’re sending low volumes to test, you may be waiting on the batch timeout, 
ie not triggering flushing a batch by volume, but waiting for the timeout, 
which may explain your latency.

Simon

On Thu, 16 May 2019 at 10:04, 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spouts": [
{
  "emitted": 1160,
  "spoutId": "kafkaSpout",
  "requestedMemOnHeap": 128,
  "errorTime": null,
  "tasks": 6,
  "errorHost": "",
  "failed": 0,
  "completeLatency": "3963.078",
  "executors": 6,
  "encodedSpoutId": "kafkaSpout",
  "transferred": 1160,
  "errorPort": "",
  "requestedMemOffHeap": 0,
  "errorLapsedSecs": null,
  "acked": 1020,
  "requestedCpu": 10,
  "lastError": "",
  "errorWorkerLogLink": ""
}

This completeLatency looks very hig

RE: Very low throuput on topologies

2019-05-21 Thread stephane.davy
Yes, this what I’m currently playing with to find the best batch size. Thanks 
for pointing the link

Stéphane


From: Michael Miklavcic [mailto:michael.miklav...@gmail.com]
Sent: Tuesday, May 21, 2019 16:12
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Also take a look at this - 
https://github.com/apache/metron/tree/master/metron-platform/metron-writer#bulk-message-writing

On Tue, May 21, 2019 at 7:47 AM 
mailto:stephane.d...@orange.com>> wrote:
Thanks Nick.



From: Nick Allen [mailto:n...@nickallen.org]
Sent: Tuesday, May 21, 2019 14:15
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

> In the link you mention below, it is said that in case batchTimeout is not 
> set, it will fall down to a fraction of topology.message.timeout.secs storm 
> parameter. Do you how we can get this fraction?

By default, the `batchTimeout` is set to 1/2 of the 
`topology.message.timeout.secs`.

https://github.com/apache/metron/blob/51d1c812c1e45f57da8c27fe37fd13797707884e/metron-platform/metron-writer/metron-writer-storm/src/main/java/org/apache/metron/writer/bolt/BatchTimeoutHelper.java#L118-L122




On Mon, May 20, 2019 at 11:03 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Nick,

You are right, it was related to batchSize and batchTimeout settings, but I was 
confused about the place it was, I was tweaking the Indexing ones. But now, 
I’ve understood a little bit better about these settings and I can see their 
effects.

By the way, I still have one question: in the link you mention below, it is 
said that in case batchTimeout is not set, it will fall down to a fraction of 
topology.message.timeout.secs storm parameter. Do you how we can get this 
fraction?

Thanks for your help ☺


From: Nick Allen [mailto:n...@nickallen.org]
Sent: Thursday, May 16, 2019 15:08
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Ok, now I understand a little better.  You are sending low volumes of telemetry 
just for testing.

I think Simon is on the right track.  There is also a batchSize and 
batchTimeout setting for Parsers that you can find at the link below.

https://metron.apache.org/current-book/metron-platform/metron-parsers/index.html






On Thu, May 16, 2019 at 8:12 AM 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

If you talk about this: 
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration

My settings are (I’ve tried many changes here):
{
  "hdfs": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "elasticsearch": {
"batchSize": 10,
"batchTimeout": 1,
"enabled": true,
"index": "ansi"
  },
  "solr": {
"batchSize": 10,
"enabled": false,
"index": "ansi"
  }
}

What is not clear for me is how these settings can influence the 
random_indexing and batch_indexing topologies. Once again, the first delay I 
see is between the source topic and the “enrichments” topic. Unless I’m wrong, 
data are moved between these 2 topics by the parsing topology, without any 
indexing action.

Stéphane


From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:40
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

Can you share your Metron batch size config and timeouts?

On Thu, 16 May 2019 at 11:31, 
mailto:stephane.d...@orange.com>> wrote:
Hello Simon,

It is what it looks like yes, but I’ve set topology.flush.tuple.freq.millis to 
10, no change. Moreover, if I send let’s say 2 lines, it will anyway take a 
lot of time to be fully processed.




From: Simon Elliston Ball 
[mailto:si...@simonellistonball.com]
Sent: Thursday, May 16, 2019 11:20
To: user@metron.apache.org
Subject: Re: Very low throuput on topologies

If you’re sending low volumes to test, you may be waiting on the batch timeout, 
ie not triggering flushing a batch by volume, but waiting for the timeout, 
which may explain your latency.

Simon

On Thu, 16 May 2019 at 10:04, 
mailto:stephane.d...@orange.com>> wrote:
Hello Michael,

So, using curl and the API, I’ve been able to collect some statistics. 
Currently, it is a test platform with nearly no activity. I’ve setup a basic 
parser, with the following topology:

-  6 ackers (I’ve 6 kafka partitions per topic)

-  Spout // = 6

-  Spout # of tasks = 6

-  Parser // = 24

-  Parser # of tasks = 24

I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs 
roughly 10 s to be visible on the enrichments topic. Here are some statistics:

  "spou

REST service for MaaS

2019-05-23 Thread stephane.davy
Hello all,

I'm going through the MaaS documentation and I see that the example is based on 
the Python / Flask REST service. I was wondering what was used in a production 
context by you all. Is Python / Flask a good choice in case of heavy load? Does 
some of you use other framework, like Scala Play for example?

My goal is to use Spark models, learning is made in batch mode and I want to 
perform prediction "on demand" through the MaaS framework. Does it make sense?

Thanks,

Stéphane


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



RE: Metron 0.7/HDP-2.5 installation issues

2019-05-23 Thread stephane.davy
On my side I’ve worked with this : 
https://docs.hortonworks.com/HDPDocuments/HCP1/HCP-1.9.1/installation.html and 
it works. My HDP is 2.6.5, because it seems that HCP isn’t currently ready for 
HDP 3




From: Michael Miklavcic [mailto:michael.miklav...@gmail.com]
Sent: Thursday, May 23, 2019 17:56
To: user@metron.apache.org
Subject: Re: Metron 0.7/HDP-2.5 installation issues

Someone may have a more recent set of instructions they can share, but the 
instructions you're referring to are very outdated. Ymmv

On Thu, May 23, 2019, 9:50 AM Sanket Sharma 
mailto:sanket.sha...@dukstra.com>> wrote:
Hi,

I am trying to install Metron on a single node (CentOS 7) using the 
instructions here 
https://cwiki.apache.org/confluence/display/METRON/Metron+0.4.1+with+HDP+2.5+bare-metal+install+on+Centos+7+with+MariaDB+for+Metron+REST

I'm using latest Ambari (2.7) and HDP-2.5. Since the default Ambari install did 
not have the HDP-2.5 profile, I ended up adding it using the service definition 
url 
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.5.3.0/HDP-2.5.3.0-37.xml
 (as listed on this page: 
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-installation/content/hdp_25_repositories.html)

Everything works as expected until I get to the service selection step on 
Amabari setup wizard. However, the following services are missing from the 
HDP-2.5 profile:


Elasticsearch
Kibana
OpenTAXII
Pycapa

The full list of services that is available is below:

YARN + MapReduce2
Tez
Hive
HBase
Pig
Sqoop
Oozie
ZooKeeper
Falcon
Storm
Flume
Accumulo
Infra Solr
Ambari Metrics
Atlas
Kafka
Knox
Log Search
Ranger
Ranger KMS
SmartSense
Spark
Spark2
Zeppelin Notebook
Mahout
Metron
Slider


Can I complete the installation without Elasticsearch, Kibana, OpenTAXII and 
Pycapa? Can they be configured and installed manually (to work with metron)?

Are there updated installation instructions available anywhere?
Metron 0.4.1 with HDP 2.5 bare-metal install on Centos 7 with MariaDB for 
Metron REST - Metron - Apache Software 
Foundation
Install everything. Metron REST will probably not work as we still need to add 
a user and the database to MariaDB. At this point, make sure that all the 
services are up.
cwiki.apache.org

Best regards,
Sanket

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



batch indexing in JSON format

2019-07-15 Thread stephane.davy
Hello all,

 

I have a question regarding batch indexing. As as I can see, data are stored
in json format in hdfs. Nevertheless, this uses a lot of storage because of
json verbosity, enrichment,.. Is there any way to use parquet for example? I
guess it’s possible to do it the day after, I mean you read the json and
with spark you save as another format, but is it possible to choose the
format at the batch indexing configuration level?

 

Thanks a lot

 

Stéphane

 

 



smime.p7s
Description: S/MIME cryptographic signature


RE: batch indexing in JSON format

2019-07-15 Thread stephane.davy
Hello all,

 

Thanks for your useful answers, it all make sense for me now. So we will 
probably go to post-processing file conversion.

 

Have a good day,

 

Stéphane

 

From: Otto Fowler [mailto:ottobackwa...@gmail.com] 
Sent: Monday, July 15, 2019 16:19
To: user@metron.apache.org
Subject: Re: batch indexing in JSON format

 

We could do something like have some other topology or job that kicks off when 
an HDFS file is closed.

So before we start a new file, we “queue” a log to some conversion topology/job 
whatever or something like that.

 

 

 

On July 15, 2019 at 10:04:08, Michael Miklavcic (michael.miklav...@gmail.com) 
wrote:

Adding to what Ryan said (and I agree), there are a couple additional 
consequences: 

1.  There are questions around just how optimal an ORC file written in 
real-time can actually be. In order to get columns of data striped effectively, 
you need a sizable number of k rows. That's probably unlikely in real-time, 
though some of these storage formats also have "engines" running that manage 
compactions (like HBase does), but I haven't checked on this in a while. I 
think Kudu may do this, actually, but again that's a whole new storage engine, 
not just a format.

2.  More importantly - loss of data - HDFS is the source of truth. We 
guarantee at-least-once processing. In order to achieve efficient columnar 
storage that makes a columnar format feasible, it's likely that we'd have to 
make larger batches in indexing. This creates a potential for lag in the system 
where we would now have to do more to worry about Storm failures than we do 
currently. With HDFS writing our partial files are still written even if 
there's a failure in the topology or elsewhere. It does have take up more disk 
space, but we felt this was a reasonable tradeoff architecturally for something 
that should be feasible to be written ad-hoc.

That being said, you could certainly write conversion jobs that should be able 
to lag the real-time processing just enough to get the benefits of real-time 
and still do a decent job of getting your data into a more efficient storage 
format, if you choose.

 

Cheers,

Mike

 

 

On Mon, Jul 15, 2019 at 7:00 AM Ryan Merriman  wrote:

The short answer is no.  Offline conversion to other formats (as you describe) 
is a better approach anyways.  Writing to a Parquet/ORC file is more compute 
intensive than just writing JSON data directly to HDFS and not something you 
need to do in real-time since you have the same data available in ES/Solr.  
This would slow down the batch indexing topology for no real gain.


On Jul 15, 2019, at 7:25 AM,  
 wrote:

Hello all,

 

I have a question regarding batch indexing. As as I can see, data are stored in 
json format in hdfs. Nevertheless, this uses a lot of storage because of json 
verbosity, enrichment,.. Is there any way to use parquet for example? I guess 
it’s possible to do it the day after, I mean you read the json and with spark 
you save as another format, but is it possible to choose the format at the 
batch indexing configuration level?

 

Thanks a lot

 

Stéphane

 

 



smime.p7s
Description: S/MIME cryptographic signature


RE: batch indexing in JSON format

2019-07-15 Thread stephane.davy
Thanks Simon, saving as hive table is also what I had in mind, so easy to do 
with spark.

 

Stéphane

 

From: Simon Elliston Ball [mailto:si...@simonellistonball.com] 
Sent: Monday, July 15, 2019 17:43
To: user@metron.apache.org
Subject: Re: batch indexing in JSON format

 

Most users will have a batch process converting the JSON short term output into 
ORC or Parquet files, often adding them to hive tables at the same time. I 
usually do this with a spark job run every hour, or even every 15mins or less 
in some cases for high throughput environments. Anecdotally, I’ve found ORC 
compresses slightly better for most Metron data than parquet, but the 
difference is marginal. 

 

The reason for this is that HDFS writer was built of the goal of getting data 
persisted in HDFS as soon as possible, so writing a columnar format would 
introduce latency to the streaming process. I suspect that a dev list 
discussion on schema management and alternative output formats will be 
forthcoming. To handle that with a sensible approach to schema migration is not 
trivial, but certainly desirable.

 

Simon


On 15 Jul 2019, at 13:25,   
wrote:

Hello all,

 

I have a question regarding batch indexing. As as I can see, data are stored in 
json format in hdfs. Nevertheless, this uses a lot of storage because of json 
verbosity, enrichment,.. Is there any way to use parquet for example? I guess 
it’s possible to do it the day after, I mean you read the json and with spark 
you save as another format, but is it possible to choose the format at the 
batch indexing configuration level?

 

Thanks a lot

 

Stéphane

 

 



smime.p7s
Description: S/MIME cryptographic signature


RE: How to configure Rsyslog omkafka to send log to kafka topic with Kerberos

2020-01-23 Thread stephane.davy
Hello,

 

Here is a piece of configuration:

 action(type="omkafka" name="" broker=[list of kafka brokers] 
partitions.auto="on" topic="your topic"

   confParam=["security.protocol=SASL_PLAINTEXT",

  "sasl.mechanism=GSSAPI",

  "sasl.kerberos.service.name=kafka",

  "sasl.kerberos.principal=your 
principal",

  
"sasl.kerberos.keytab=/etc/security/keytabs/your keytab",

  
"sasl.kerberos.kinit.cmd=/usr/bin/kinit -S 
%{sasl.kerberos.service.name}/%{broker.name} -t %{sasl.kerberos.keytab} -k 
%{sasl.kerberos.principal}"]

  )

 

Unfortunately, it doesn’t work when I tested it a few month ago because of a 
bug in rsyslog. I’ve lost the error message but when I got it and googled on 
it, I found some discussion about a known bug on rsyslog side. May be it is 
fixed now.

 

Stéphane

 

 

From: Nick Allen [mailto:n...@nickallen.org] 
Sent: Thursday, January 23, 2020 21:25
To: user@metron.apache.org
Subject: Re: How to configure Rsyslog omkafka to send log to kafka topic with 
Kerberos

 

After reading a bit, I can see that you will want to use ConfParam 

 . Per those docs, it looks like omkafka uses librdkafka under the hood. 
Fortunately, I am familiar with librdkafka.  Here are the available settings 
for librdkafka 
 
.  

 

You might also be interested in some documentation for Fastcapa (which is a 
packet capture mechanism in Metron.) Fastcapa also uses librdkafka 
under-the-hood and we have documentation which describes how to make that work 
with kerberos 
 
. The configuration that you need will most likely be very similar.

 

Hope this helps.

 

 

 

On Thu, Jan 23, 2020 at 3:13 PM Nick Allen  wrote:

I do not have familiarity with omkafka, but you need to pass some 
Kafka-specific configuration options when using Kerberos.

 

These links might help you understand what Kafka configuration options are 
needed.  Then you would just need to determine how to make those adjustments 
with omkafka.

*   
https://metron.apache.org/current-book/metron-deployment/Kerberos-manual-setup.html#Push_Data
*   
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/authentication-with-kerberos/content/kerberos_kafka_producing_events_or_messages_to_kafka_on_a_secured_cluster.html
*   https://kafka.apache.org/documentation/#producerconfigs

 

 

 

 

On Thu, Jan 23, 2020 at 2:56 PM Yu Zhang  wrote:

Hi,

 

I am doing a Metron PoC. Now, I can send rsyslog records to Kafka topic without 
Kerberos. How to configure Rsyslog omkafka to send log to kafka topic with 
Kerberos?

 

Thanks,

 

Yu Zhang

Security Engineer - Big Data Virtualization and Security

GM | Global Infrastructure

  yu.4.zh...@gm.com

C (303) 503-5481

 

 



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message. 

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer. 



smime.p7s
Description: S/MIME cryptographic signature


RE: Connecting Metron to Elasticsearch with credentials

2020-03-25 Thread stephane.davy
Hello Tom,

 

For now I cannot reach my Metron setup because of covid19, but as far as I
remember there is nothing special here, you have the login / password fields
available to configure your credentials in Elasticsearch definition.

 

What is your trouble?

 

 

From: Yerex, Tom [mailto:tom.ye...@ubc.ca] 
Sent: Tuesday, March 24, 2020 22:57
To: user@metron.apache.org
Subject: Connecting Metron to Elasticsearch with credentials

 

Good afternoon,

Our Elasticsearch install requires credentials to connect over port 9200.
Has anyone set up a connection between Metron and Elasticsearch using
credentials and/or can offer some guidance on how to achieve this?

Cheers,

Tom.



smime.p7s
Description: S/MIME cryptographic signature