mmiklavc edited a comment on issue #1523: METRON-2232 Upgrade to Hadoop 3.1.1 URL: https://github.com/apache/metron/pull/1523#issuecomment-540208150 ## Testing Adapted from a few places * https://gist.github.com/nickwallen/ed67fdc8b399f6db5fa4901b07fc3fff * https://cwiki.apache.org/confluence/display/METRON/2016/04/25/Metron+Tutorial+-+Fundamentals+Part+1%3A+Creating+a+New+Telemetry ### Preliminaries Test using the centos7 development environment. * Start up the centos7 dev environment. ``` cd metron-deployment/development/centos7 vagrant destroy -f vagrant up # ssh into the box as root@node1, pwd=vagrant ``` * Run as root is fine * Set env vars ``` source /etc/default/metron ``` * Root user needs a home dir in HDFS. You can do that as follows: ``` sudo -u hdfs hdfs dfs -mkdir /user/root sudo -u hdfs hdfs dfs -chown root:root /user/root ``` * Download the Alexa top 1m data set ``` cd ~/ wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip unzip top-1m.csv.zip ``` * Stage import file ``` head -n 10000 top-1m.csv > top-10k.csv hdfs dfs -put top-10k.csv /tmp ``` * Truncate hbase ``` echo "truncate 'enrichment'" | hbase shell ``` ### Basic Indexing and Enrichment Ensure that we can continue to parse, enrich, and index telemetry. Verify data is flowing through the system, from parsing to indexing 1. Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary 1. Open the Alerts UI. Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear) 1. Go to the Alerts UI and ensure that an ever increasing number of telemetry from Bro, Snort, and YAF are visible by watching the total alert count increase over time. 1. Ensure that geoip enrichment is occurring. The telemetry should contain fields like `enrichments:geo:ip_src_addr:location_point`. 1. Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary 1. Open the Kibana dashboard via the "Metron UI" option in the quick links 1. Verify the dashboard is populating ### Batch Indexing 1. Use the Alerts UI to retrieve a rough count of the number of Bro messages that have been indexed. 1. Retrieve the number of Bro messages that have been indexed in HDFS. ``` [root@node1 0.7.2]# hdfs dfs -cat /apps/metron/indexing/indexed/bro/* | wc -l 2785 ``` 1. The number of messages indexed in HDFS should be close to the number indexed to the search indices. ### Streaming Enrichments Adapted from the [Metron Tutorial Series](https://cwiki.apache.org/confluence/display/METRON/2016/06/16/Metron+Tutorial+-+Fundamentals+Part+6%3A+Streaming+Enrichment). 1. Launch the Stellar REPL. ``` cd $METRON_HOME $METRON_HOME/bin/stellar -z $ZOOKEEPER ``` 1. Define the streaming enrichment and save it as a new source of telemetry. ``` [Stellar]>>> conf := SHELL_EDIT(conf) { "parserClassName": "org.apache.metron.parsers.csv.CSVParser", "writerClassName": "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter", "sensorTopic": "user", "parserConfig": { "shew.table": "enrichment", "shew.cf": "t", "shew.keyColumns": "ip", "shew.enrichmentType": "user", "columns": { "user": 0, "ip": 1 } } } [Stellar]>>> [Stellar]>>> CONFIG_PUT("PARSER", conf, "user") ``` 1. Go to the Management UI and start the new parser called 'user'. 1. Create some test telemetry. ``` [Stellar]>>> msgs := ["user1,192.168.1.1", "user2,192.168.1.2", "user3,192.168.1.3"] [user1,192.168.1.1, user2,192.168.1.2, user3,192.168.1.3] [Stellar]>>> KAFKA_PUT("user", msgs) 3 [Stellar]>>> KAFKA_PUT("user", msgs) 3 [Stellar]>>> KAFKA_PUT("user", msgs) 3 ``` 1. Ensure that the enrichments are persisted in HBase. ``` [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.1', 'enrichment', 't') {original_string=user1,192.168.1.1, guid=a6caf3c1-2506-4eb7-b33e-7c05b77cd72c, user=user1, timestamp=1551813589399, source.type=user} [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.2', 'enrichment', 't') {original_string=user2,192.168.1.2, guid=49e4b8fa-c797-44f0-b041-cfb47983d54a, user=user2, timestamp=1551813589399, source.type=user} [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.3', 'enrichment', 't') {original_string=user3,192.168.1.3, guid=324149fd-6c4c-42a3-b579-e218c032ea7f, user=user3, timestamp=1551813589402, source.type=user} ``` ### Enrichment Coprocessor 1. Confirm that the 'user' enrichment added in the previous section was 'found' by the coprocessor. * Go to Swagger. * Click the `sensor-enrichment-config-controller` option. * Click the `GET /api/v1/sensor/enrichment/config/list/available/enrichments` option. 1. Click the "Try it out!" button. You should see an array returned with the value of each enrichment type that you have loaded. ``` [ "user" ] ``` ### Enrichment Stellar Functions in Storm Adapted from (https://cwiki.apache.org/confluence/display/METRON/2016/04/28/Metron+Tutorial+-+Fundamentals+Part+2%3A+Creating+a+New+Enrichment) to load the user data. 1. Create a simple file called `user.csv`. ``` jdoe,192.168.138.2, moredoe,192.168.138.158 ``` 1. Create a file called `user-extractor.json`. ``` { "config": { "columns": { "user": 0, "ip": 1 }, "indicator_column": "ip", "separator": ",", "type": "user" }, "extractor": "CSV" } ``` 1. Import the data. ``` source /etc/default/metron $METRON_HOME/bin/flatfile_loader.sh -i ./user.csv -t enrichment -c t -e ./user-extractor.json ``` 1. Validate that the enrichment loaded successfully. ``` [root@node1 0.7.2]# source /etc/default/metron [root@node1 0.7.2]# $METRON_HOME/bin/stellar -z $ZOOKEEPER [Stellar]>>> ip_src_addr := "192.168.138.158" 192.168.138.158 [Stellar]>>> ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't') {ip=192.168.138.158, user=moredoe} [Stellar]>>> ip_dst_addr := "192.168.138.2" 192.168.138.2 [Stellar]>>> ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't') {ip=192.168.138.2, user=jdoe} ``` 1. Use the User data to enrich the telemetry. Run the following commands in the REPL. ``` [Stellar]>>> bro := SHELL_EDIT() { "enrichment" : { "fieldMap": { "stellar" : { "config" : { "users" : "ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')", "users2" : "ENRICHMENT_GET('user', ip_src_addr, 'enrichment', 't')" } } } }, "threatIntel": { "fieldMap": {}, "fieldToTypeMap": {} } } [Stellar]>>> CONFIG_PUT("ENRICHMENT", bro, "bro") ``` 1. Wait for the new configuration to be picked up by the running topology. 1. Review the Bro telemetry indexed into Elasticsearch. Look for records where the `ip_dst_addr` is `192.168.138.2`. Ensure that some of the messages have the following fields created from the enrichment. (Wait a few minutes longer and you should also eventually start to see records with fields `"users2:user": "moredoe"`). * `users:user` * `users:ip` ``` { "_index": "bro_index_2019.08.13.20", "_type": "bro_doc", "_id": "AWyMxSJFg1bv3MpSt284", ... "_source": { "ip_dst_addr": "192.168.138.2", "ip_src_addr": "192.168.138.158", "timestamp": 1565729823979, "source:type": "bro", "guid": "6778beb4-569d-478f-b1c9-8faaf475ac2f" ... "users:user": "jdoe", "users:ip": "192.168.138.2", ... }, ... } ``` ### Loaders and Summarizers in MR mode #### Test the flatfile loader in MR mode * Create an extractor.json for the CSV data by editing `extractor.json` and pasting in these contents: ``` { "config" : { "columns" : { "domain" : 1, "rank" : 0 } ,"indicator_column" : "domain" ,"type" : "alexa" ,"separator" : "," }, "extractor" : "CSV" } ``` * Import from HDFS via MR ``` # import data into hbase $METRON_HOME/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t -e ./extractor.json -m MR # count data written and verify it's 10k echo "count 'enrichment'" | hbase shell ``` #### Test the flatfile summarizer in MR mode * Create an extractor-count.json file and paste the following: ``` { "config" : { "columns" : { "rank" : 0, "domain" : 1 }, "value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)" }, "value_filter" : "LENGTH(domain) > 0", "state_init" : "0L", "state_update" : { "state" : "state + LENGTH( DOMAIN_TYPOSQUAT( domain ))" }, "state_merge" : "REDUCE(states, (s, x) -> s + x, 0)", "separator" : "," }, "extractor" : "CSV" } ``` * Create the summary from HDFS via MR ``` $METRON_HOME/bin/flatfile_summarizer.sh -i /tmp/top-10k.csv -e ~/extractor_count.json -p 5 -om CONSOLE -m MR ``` * Verify you see a count in the output similar to the following: ``` Processing /root/top-10k.csv 19/10/03 21:19:56 WARN resolver.BaseFunctionResolver: Using System classloader Processed 9999 - \ 3478276 ``` ### Legacy HBase Adapter We are going to perform the same enrichment, but instead using the legacy HBase Adapter. 1. Use the User data to enrich the telemetry. Run the following commands in the REPL. ``` [Stellar]>>> yaf := SHELL_EDIT() { "enrichment" : { "fieldMap" : { "hbaseEnrichment" : [ "ip_dst_addr" ] }, "fieldToTypeMap" : { "ip_dst_addr" : [ "user" ] }, "config" : { "typeToColumnFamily" : { "user" : "t" } } }, "threatIntel" : { }, "configuration" : { } } [Stellar]>>> CONFIG_PUT("ENRICHMENT", yaf, "yaf") ``` 1. Wait for the new configuration to be picked up by the running topology. 1. Review the YAF telemetry indexed into Elasticsearch. Look for records where the `ip_dst_addr` is `192.168.138.2`. Ensure that some of the messages have the following fields created from the enrichment. * `enrichments:hbaseEnrichment:ip_dst_addr:user:ip` * `enrichments:hbaseEnrichment:ip_dst_addr:user:user` ``` { "_index": "yaf_index_2019.08.15.03", "_type": "yaf_doc", "_id": "AWyTZAwEIFY9jxc2THLF", "_version": 1, "_score": null, "_source": { "source:type": "yaf", "ip_dst_addr": "192.168.138.2", "ip_src_addr": "192.168.138.158", "guid": "6c73c09d-f099-4646-b653-762adce121fe", ... "enrichments:hbaseEnrichment:ip_dst_addr:user:ip": "192.168.138.2", "enrichments:hbaseEnrichment:ip_dst_addr:user:user": "jdoe", } } ``` ### Profiler #### Profiler in the REPL 1. Test a profile in the REPL according to [these instructions](https://github.com/apache/metron/tree/master/metron-analytics/metron-profiler-repl#getting-started). ``` [Stellar]>>> values := PROFILER_FLUSH(profiler) [{period={duration=900000, period=1723089, start=1550780100000, end=1550781000000}, profile=hello-world, groups=[], value=4, entity=192.168.138.158}] ``` #### Streaming Profiler 1. Deploy that profile to the Streaming Profiler in Storm. ``` [Stellar]>>> CONFIG_PUT("PROFILER", conf) ``` 1. Wait for the Streaming Profiler in Storm to flush and retrieve the measurement from HBase. For the impatient, you can reset the period duration to 1 minute. Alternatively, you can allow the Profiler topology to work for a minute or two and then kill the `profiler` topology which will force it to flush a profile measurement to HBase. Retrieve the measurement from HBase. Prior to this PR, it was not possible to query HBase from the REPL. ``` [Stellar]>>> PROFILE_GET("hello-world","192.168.138.158",PROFILE_FIXED(30,"DAYS")) [2979] ``` #### Batch Profiler 1. Install Spark using Ambari. 1. Stop Storm, YARN, Elasticsearch, Kibana, and Kafka. 1. Install Spark2 using Ambari. 1. Ensure that Spark can talk with HBase. ``` cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/ ``` 1. Use the Batch Profiler to back-fill your profile. To do this, follow the direction [provided here](https://github.com/apache/metron/tree/master/metron-analytics/metron-profiler-spark#getting-started). 1. Retrieve the entire profile, including the back-filled data. ``` [Stellar]>>> PROFILE_GET("hello-world","192.168.138.158",PROFILE_FIXED(30,"DAYS")) [1203, 2849, 2900, 1944, 1054, 1241, 1721] ``` ### PCAP Pulled from https://github.com/apache/metron/pull/1157#issuecomment-412972370 Get PCAP data into Metron: 1. Install and setup pycapa (this has been updated in master recently) - https://github.com/apache/metron/blob/master/metron-sensors/pycapa/README.md#centos-6 2. (if using singlenode vagrant) Kill the enrichment, profiler, indexing, and sensor topologies via `for i in bro enrichment random_access_indexing batch_indexing yaf snort;do storm kill $i;done` 3. Start the pcap topology via $METRON_HOME/bin/start_pcap_topology.sh 4. Start the pycapa packet capture producer on eth1 ``` cd /opt/pycapa/pycapa-venv/bin/usr/bin pycapa --producer --kafka-topic pcap --interface eth1 --kafka-broker $BROKERLIST ``` 5. Watch the topology in the Storm UI and kill the packet capture utility started earlier when the number of packets ingested is over 3k. 6. You can leave your virtualenv session now via `deactivate` 7. Ensure that at at least 3 files exist on HDFS by running `hdfs dfs -ls /apps/metron/pcap/input` 8. Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility ``` FILE=<file path in hdfs> $METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5 ``` 9. Choose one of the lines in your output and note the protocol. e.g. ``` TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.1,ip_src_port: 60911,ip_dst_addr: 192.168.66.121,ip_dst_port: 8080,protocol: 6 TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6 TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6 TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.121,ip_src_port: 8080,ip_dst_addr: 192.168.66.1,ip_dst_port: 60911,protocol: 6 TS: October 9, 2019 8:43:39 PM UTC,ip_src_addr: 192.168.66.1,ip_src_port: 60911,ip_dst_addr: 192.168.66.121,ip_dst_port: 8080,protocol: 6 ``` **Note** when you run the fixed and query filter commands below, the resulting file will be placed in the execution directory where you kicked off the job from. #### Fixed filter 1. Run a fixed filter query by executing the following command with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch) 2. `cd ~/; $METRON_HOME/bin/pcap_query.sh fixed -st <start_time> -df "yyyyMMdd" -p <protocol_num> -rpf 500` 3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap 4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500. #### Query filter 1. Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch) 2. `$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query "protocol == '6'" -rpf 500` 3. Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap 4. Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500. ### MaaS Follow the Example from this README - https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example Updated MaaS testing should follow from https://github.com/apache/metron/pull/1536
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services