[ 
https://issues.apache.org/jira/browse/METRON-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Allen updated METRON-1699:
-------------------------------
    Fix Version/s: Next + 1

> Create Batch Profiler
> ---------------------
>
>                 Key: METRON-1699
>                 URL: https://issues.apache.org/jira/browse/METRON-1699
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>             Fix For: Next + 1
>
>         Attachments: Screen Shot 2018-07-27 at 10.55.27 AM.png, Screen Shot 
> 2018-07-27 at 11.07.33 AM.png, Screen Shot 2018-07-27 at 11.10.16 AM.png
>
>
> Create a Batch Profiler that satisfies the following use cases.
> h3. Use Cases
>  * As a Security Data Scientist, I want to understand the historical 
> behaviors and trends of a profile that I have created so that I can determine 
> if I have created a feature set that has predictive value for model building.
>  * As a Security Data Scientist, I want to understand the historical 
> behaviors and trends of a profile that I have created so that I can determine 
> if I have defined the profile correctly and created a feature set that 
> matches reality.
>  * As a Security Platform Engineer, I want to generate a profile using 
> archived telemetry when I deploy a new model to production so that models 
> depending on that profile can function on day 1.
> h3. Goal
>  * Currently, a profile can only be generated from the telemetry consumed 
> *after* the profile was created.
>  * The goal would be to enable “profile seeding” which allows profiles to be 
> populated from a time *before* the profile was created.
>  * A profile would be seeded using the telemetry that has been archived by 
> Metron in HDFS.
>  * A profile consumer should not be able to distinguish the “seeded” portion 
> of a profile.
> !Screen Shot 2018-07-27 at 10.55.27 AM.png!
> h3. Current State
>  * There are currently two ports of the Profiler; the Streaming Profiler that 
> handles streaming data in Storm and the other that runs in the REPL and 
> allows a user to manually build, test, and debug profiles.
>  * These ports largely share a common code base in 
> metron-analytics/metron-profiler-common.
>  * A smaller set of “orchestration” logic is required to maintain each port; 
> one for Storm, another for the REPL.
>  * Both Profiler ports supports both system time and event time processing.
> !Screen Shot 2018-07-27 at 11.07.33 AM.png!
> h3. Approach
>  * Create a third port of the Profiler; the Batch Profiler.
>  * The Batch Profiler will be built to run in Spark so that the telemetry can 
> be consumed in batch.
>  * Allows a user to seed profiles using the JSON telemetry that is archived 
> in HDFS by Metron Indexing.
>  * Only generates the profile data stored in HBase, not the messages that are 
> produced for Threat Triage and Kafka.
>  * Any number of profiles can be generated at once, but no dependencies 
> between the profiles are supported. A dependency is where one profile is a 
> consumer of the profile generated by another.
>  * The Batch Profiler must use the timestamps contained within the telemetry; 
> it runs on event time. Luckily the Profiler already supports event time.
>  * Enable a pluggable mechanism so that telemetry stored in different formats 
> can be consumed by the Batch Profiler. For example, the Profiler should be 
> able to consume telemetry stored as raw JSON or in other formats like ORC or 
> Parquet.  
> !Screen Shot 2018-07-27 at 11.10.16 AM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to