[
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703559#comment-14703559
]
Matt Wise commented on FLUME-2126:
----------------------------------
Hey guys, I'm re-opening this because of a pretty big problem this creates. If
you have been previously handling structured-logs in your own way (possibly
passing in JSON in your @message field and ignoring it), this change suddenly
starts trying to handle the JSON properly and passes it into ElasticSearch as
structured data.
The problem here is that if your schema in ElasticSearch says that @message is
a string (_the default behavior prior to Flume 1.6.0) then the new structured
data fails to be imported and you run into schema problems. This has the
potential to break anyones ElasticSearch/Flume setup the moment they try to
pass in a JSON log message -- regardless of whether they want it "structured"
or not.
At the absolute very least, this needs to be a configurable option. We handle
our structured data through a custom sink (that the
ElasticSearchLogStashEventSerializer) that automatically finds structured data,
parses it, and creates named fields that have the type in them (i.e., { "foo":
"bar"} becomes { "@data.foo__str": "bar" }.
> Problem in elasticsearch sink when the event body is a complex field
> --------------------------------------------------------------------
>
> Key: FLUME-2126
> URL: https://issues.apache.org/jira/browse/FLUME-2126
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Environment: 1.3.1 and 1.4
> Reporter: Massimo Paladin
> Fix For: v1.6.0
>
> Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the
> {{ContentBuilderUtil.addComplexField}} method, when it does
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}}
> with the result of being serialized with the {{toString}} method in the
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it
> has to parse the content twice, I guess there is a better way to solve the
> problem but I am not an elasticsearch api expert.
> {code}
> ---
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> tmp.copyCurrentStructure(parser);
> - builder.field(fieldName, tmp);
> +
> + // if it is a valid structure then we include it
> + parser = XContentFactory.xContent(contentType).createParser(data);
> + parser.nextToken();
> + builder.field(fieldName);
> + builder.copyCurrentStructure(parser);
> } catch (JsonParseException ex) {
> // If we get an exception here the most likely cause is nested JSON
> that
> // can't be figured out in the body. At this point just push it through
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)