[ https://issues.apache.org/jira/browse/METRON-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705868#comment-15705868 ]
Nick Allen commented on METRON-594: ----------------------------------- I would like to get feedback from the community. How should this work? What use cases do you envision? What features do we need to support this? > Replay Telemetry Data through Profiler > -------------------------------------- > > Key: METRON-594 > URL: https://issues.apache.org/jira/browse/METRON-594 > Project: Metron > Issue Type: Improvement > Reporter: Nick Allen > > The Profiler currently consumes live telemetry, in real-time, as it is > streamed through Metron. A useful extension of this functionality would > allow the Profiler to also consume archived, historical telemetry. Allowing > a user to selectively replay archived, historical raw telemetry through the > Profiler has a number of applications. The following use cases help describe > why this might be useful. > Use Case 1 - Model Development > When developing a new model, I often need a feature set of historical data on > which to train my model. I can either wait days, weeks, months for the > Profiler to generate this based on live data or I could re-run the raw, > historical telemetry through the Profiler to get started immediately. It is > much simpler to use the same mechanism to create this historical data set, > than a separate batch-driven tool to recreate something that approximates the > historical feature set. > Use Case 2 - Model Deployment > When deploying an analytical model to a new environment, like production, on > day 1 there is often no historical data for the model to work with. This > often leaves a gap between when the model is deployed and when that model is > actually useful. If I could replay raw telemetry through the profiler a > historical feature set could be created as part of the deployment process. > This allows my model to start functioning on day 1. > Use Case 3 - Profile Validation > When creating a Profile, it is difficult to understand how the configured > profile might behave against the entire data set. By creating the profile > and watching it consume real-time streaming data, I only have an > understanding of how it behaves on that small segment of data. If I am able > to replay historical telemetry, I can instantly understand how it behaves on > a much larger data set; including all the anomalies and exceptions that > exist in all large data sets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)