[ https://issues.apache.org/jira/browse/METRON-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863166#comment-15863166 ]
ASF GitHub Bot commented on METRON-701: --------------------------------------- Github user james-sirota commented on a diff in the pull request: https://github.com/apache/incubator-metron/pull/449#discussion_r100719426 --- Diff: metron-analytics/metron-profiler/src/main/java/org/apache/metron/profiler/bolt/KafkaDestinationHandler.java --- @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ + +package org.apache.metron.profiler.bolt; + +import org.apache.metron.common.utils.JSONUtils; +import org.apache.metron.profiler.ProfileMeasurement; +import org.apache.storm.task.OutputCollector; +import org.apache.storm.topology.OutputFieldsDeclarer; +import org.apache.storm.tuple.Fields; +import org.apache.storm.tuple.Values; +import org.json.simple.JSONObject; + +import java.io.Serializable; + +/** + * Handles emitting a ProfileMeasurement to the stream which writes + * profile measurements to Kafka. + */ +public class KafkaDestinationHandler implements DestinationHandler, Serializable { + + /** + * The stream identifier used for this destination; + */ + private String streamId = "kafka"; + + @Override + public void declareOutputFields(OutputFieldsDeclarer declarer) { + // the kafka writer expects a field named 'message' + declarer.declareStream(getStreamId(), new Fields("message")); + } + + @Override + public void emit(ProfileMeasurement measurement, OutputCollector collector) { + + try { + JSONObject message = new JSONObject(); + message.put("profile", measurement.getDefinition().getProfile()); + message.put("entity", measurement.getEntity()); + message.put("period", measurement.getPeriod().getPeriod()); + message.put("periodStartTime", measurement.getPeriod().getStartTimeMillis()); + + // TODO How to serialize an object (like a StatisticsProvider) in a form that can be used on the other side? (Threat Triage) + // TODO How to embed binary in JSON? + message.put("value", measurement.getValue()); + --- End diff -- One of the problems that I see with this approach is that generally you wouldn't ask a question "do you think that the number of inbound connections from ip X is abnormal for the past 15 mins. the question you would ask is the number of inbound connections for ip X abnormal for a Tuesday at 11.15AM. To figure out the answer to this question we would have to retrieve telemetry from historical Tuesday data at 11.15AM and check if our value at hand for right now is an outlier based on that data. METRON-690 is designed to make these retrieval patterns possible. Because you retrieving values that are potentially more than 24 hours long I am not sure keeping this data in Kafka is a good idea. I think you have to retrieve the values via a multi-get in Hbase > Triage Metrics Produced by the Profiler > --------------------------------------- > > Key: METRON-701 > URL: https://issues.apache.org/jira/browse/METRON-701 > Project: Metron > Issue Type: Improvement > Reporter: Nick Allen > Assignee: Nick Allen > > h3. Problem > The motivating example is that I would like to create an alert if the number > of inbound flows to any host over a 15 minute interval is abnormal. > The value being interrogated here, the number of inbound flows, is not a > static value contained within any single telemetry message. This value is > calculated across multiple messages by the Profiler. The current Threat > Triage process cannot be used to interrogate values calculated by the > Profiler. > h3. Proposed Solution > I am proposing that we treat the Profiler as a source of telemetry. The > measurements captured by the Profiler would be enqueued into a Kafka topic. > We would then treat those Profiler messages like any other telemetry. We > would parse, enrich, triage, and index those messages. > This would have the following advantages. > 1. We would be able to reuse the same threat triage mechanism for values > calculated by the Profiler. > 2. We would be able to generate profiles from the profiled data - aka > meta-profiles anyone? -- This message was sent by Atlassian JIRA (v6.3.15#6346)