Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/517 I spun this up in full-dev and did the following to test: * Download the Alexa top 1m data set ``` wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip unzip top-1m.csv.zip ``` * Stage import file ``` head -n 10000 top-1m.csv > top-10k.csv head -n 10 top-1m.csv > top-10.csv hadoop fs -put top-10k.csv /tmp ``` * Create an extractor.json for the CSV data by editing `extractor.json` and pasting in these contents: ``` { "config" : { "zk_quorum" : "node1:2181", "columns" : { "rank" : 0, "domain" : 1 }, "value_transform" : { "domain" : "DOMAIN_REMOVE_TLD(domain)", "port" : "es.port" }, "value_filter" : "LENGTH(domain) > 0", "indicator_column" : "domain", "indicator_transform" : { "indicator" : "DOMAIN_REMOVE_TLD(indicator)" }, "indicator_filter" : "LENGTH(indicator) > 0", "type" : "top_domains", "separator" : "," }, "extractor" : "CSV" } ``` * Import enriched data `echo "truncate 'enrichment'" | hbase shell && $METRON_HOME/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e ./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell` * Open up the stellar shell via `$METRON_HOME/bin/stellar -z node1` and execute the following: ``` MAP(['google', 'pdf2doc', 'yahoo'], indicator -> MAP_GET('domain', ENRICHMENT_GET('top_domains', indicator, 'enrichment', 't')) ) ``` You should see `[google, pdf2doc, yahoo]` returned.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---