[ https://issues.apache.org/jira/browse/METRON-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539068#comment-16539068 ]
ASF GitHub Bot commented on METRON-1644: ---------------------------------------- Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/1084#discussion_r201448152 --- Diff: metron-platform/metron-common/src/main/java/org/apache/metron/common/message/metadata/EnvelopedRawMessageStrategy.java --- @@ -0,0 +1,146 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.common.message.metadata; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.utils.JSONUtils; +import org.json.simple.JSONObject; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.HashMap; +import java.util.Map; + +/** + * An alternative strategy whereby + * <ul> + * <li>The raw data is presumed to be a JSON Map</li> + * <li>The data to be parsed is the contents of one of the fields.</li> + * <li>The non-data fields are considered metadata</li> + * </ul> + * + * Additionally, the defaults around merging and reading metadata are adjusted to be on by default. + * Note, this strategy allows for parser chaining and for a fully worked example, check the parser chaining use-case. + */ +public class EnvelopedRawMessageStrategy implements RawMessageStrategy { + private static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + /** + * The field from the rawMessageStrategyConfig in the SensorParserConfig that defines the field to use to + * define the data to be parsed. + */ + public static final String MESSAGE_FIELD_CONFIG = "messageField"; + + /** + * Retrieve the raw message by parsing the JSON Map in the kafka value and pulling the appropriate field. + * Also, augment the default metadata with the non-data fields in the JSON Map. + * + * Note: The data field in the JSON Map is not considered metadata. + * + * @param rawMetadata The metadata read from kafka Key (e.g. the topic, index, etc.) + * @param rawMessage The raw message from the kafka value + * @param readMetadata True if we want to read read the metadata + * @param config The config for the RawMessageStrategy (See the rawMessageStrategyConfig in the SensorParserConfig) + * @return + */ + @Override + public RawMessage get(Map<String, Object> rawMetadata, byte[] rawMessage, boolean readMetadata, Map<String, Object> config) { + String messageField = (String)config.get(MESSAGE_FIELD_CONFIG); + if(messageField == null) { + throw new IllegalStateException("You must specify a message field in the message supplier config. " + + "I expected to find a \"messageField\" field in the config."); --- End diff -- haha > Support parser chaining > ----------------------- > > Key: METRON-1644 > URL: https://issues.apache.org/jira/browse/METRON-1644 > Project: Metron > Issue Type: Improvement > Reporter: Casey Stella > Priority: Major > > Currently we have only one layer of parsing prior to enrichment, but often > real data is more complex. For instance, often data is wrapped in an > envelope (e.g. syslog data which contains a field which needs to be parsed). > This Jira allows us to support a DAG of parsers prior to enrichment by > allowing us to provide a strategy for interpreting what is data and what is > metadata in the parser bolt. This enables upstream parsers to pass in a JSON > Blob which contains the metadata and have the parser bolt choose which field > is the data to be parsed, the remaining fields would be considered metadata > (and merged into the resulting data or not depending on our existing rules > for handling metadata). > > To illustrate this better, I've provided a use-case with an example. Note, > this PR depends on METRON-1643 and METRON-1642, so those should be reviewed > prior to this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)