Github user cestella commented on the pull request: https://github.com/apache/incubator-metron/pull/127#issuecomment-222364875 In order to validate this, you can do the following: * Configure a new parser, in this example I'll call it a `user` parser and we'll parse some CSV data to map `username` to `ip` by creating a file `/usr/metron/0.1BETA/config/zookeeper/enrichment/user.json` with ``` { "parserClassName" : "org.apache.metron.parsers.csv.CSVParser" ,"writerClassName" : "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter" ,"sensorTopic":"user" ,"parserConfig": { "shew.table" : "enrichment" ,"shew.cf" : "t" ,"shew.keyColumns" : "user" ,"shew.enrichmentType" : "user" ,"columns" : { "user" : 0 ,"ip" : 1 } } } ``` * Add a new `user` enrichment type to `bro` data by adding `ip_src_addr` to `hbaseEnrichment` and associating `user` as a field type for `ip_src_addr` in `/usr/metron/0.1BETA/config/zookeeper/enrichment/bro.json` like so ``` { "index": "bro", "batchSize": 5, "enrichment": { "fieldMap": { "geo": [ "ip_dst_addr", "ip_src_addr" ], "host": [ "host" ], "hbaseEnrichment" : [ "ip_src_addr" ] }, "fieldToTypeMap": { "ip_src_addr" : [ "user"] } }, "threatIntel":{ "fieldMap": { "hbaseThreatIntel": ["ip_dst_addr", "ip_src_addr"] }, "fieldToTypeMap": { "ip_dst_addr" : [ "malicious_ip" ] ,"ip_src_addr" : [ "malicious_ip" ] } } }``` * Create the Kafka Queue as in the tutorials * Using `/usr/metron/0.1BETA/bin/zk_load_configs.sh` push up the config you just created. `/usr/metron/0.1BETA/bin/zk_load_configs.sh -m PUSH -z node1:2181 -i /usr/metron/0.1BETA/config/zookeeper` * Create some reference CSV reference data with that looks like `jsirota,192.168.168.1` into a csv file named `user.csv` * Use the kafka console producer to push data into the `user` topic via `cat user.csv | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list node1:6667 --topic user` * You should be able to check that the data gets into HBase by doing a `scan 'enrichment'` from the `hbase shell` * You should also be able to check, after new data has been run through, that the data is enriched in elasticsearch. I would suggest bouncing the enrichment topology to ensure that stale data in the caches get flushed, but that is not strictly necessary.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---