Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/517
  
    I spun this up in full-dev and did the following to test:
    * Download the Alexa top 1m data set
    ```
    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    ```
    * Stage import file
    ```
    head -n 10000 top-1m.csv > top-10k.csv
    head -n 10 top-1m.csv > top-10.csv
    hadoop fs -put top-10k.csv /tmp
    ```
    * Create an extractor.json for the CSV data by editing `extractor.json` and 
pasting in these contents:
    ```
    {
      "config" : {
        "zk_quorum" : "node1:2181",
        "columns" : {
           "rank" : 0,
           "domain" : 1
        },
        "value_transform" : {
           "domain" : "DOMAIN_REMOVE_TLD(domain)",
           "port" : "es.port"
        },
        "value_filter" : "LENGTH(domain) > 0",
        "indicator_column" : "domain",
        "indicator_transform" : {
           "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
        },
        "indicator_filter" : "LENGTH(indicator) > 0",
        "type" : "top_domains",
        "separator" : ","
      },
      "extractor" : "CSV"
    }
    ```
    * Import enriched data
    `echo "truncate 'enrichment'" | hbase shell && 
$METRON_HOME/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e 
./extractor.json -p 5 -b 128 && echo "count 'enrichment'" | hbase shell`
    * Open up the stellar shell via `$METRON_HOME/bin/stellar -z node1` and 
execute the following:
    ```
    MAP(['google', 'pdf2doc', 'yahoo'], indicator -> MAP_GET('domain', 
ENRICHMENT_GET('top_domains', indicator, 'enrichment', 't')) )
    ```
    You should see `[google, pdf2doc, yahoo]` returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to