[ https://issues.apache.org/jira/browse/FLUME-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504410#comment-14504410 ]
Hari Shreedharan commented on FLUME-2649: ----------------------------------------- +1. Committing. Thanks [~bfiorini] for the patch and [~ejsarge] for the review! > Elasticsearch sink doesn't handle JSON fields correctly > ------------------------------------------------------- > > Key: FLUME-2649 > URL: https://issues.apache.org/jira/browse/FLUME-2649 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Reporter: Francis > Assignee: Benjamin Fiorini > Attachments: FLUME-2649-0.patch, FLUME-2649-1.patch, > FLUME-2649-2.patch, FLUME-2649-3.patch, FLUME-2649-4.patch, > FLUME-2649-5.patch, FLUME-2649-6.patch > > > JSON attributes are treated like normal strings and are escaped by the sink. > For example, if the body or a header contains the following value: > {code:javascript} > {"foo":"bar"} > {code} > It will be added like this in Elasticsearch: > {code:javascript} > {"@message": "{\"foo\":\"bar\"}}" > {code} > We end up with a plain string instead of a valid JSON field. > I think I found how to fix this bug. The source of the problem is caused by > the way a "complex field" is added. The ES XContent classes are used to parse > the data in the detected format, but then, instead of adding the parsed data, > the string() method is called and it converts it back to a string that is the > same as the initial data! Here is the current code with added comments: > {code} > XContentBuilder tmp = jsonBuilder(); // This tmp builder is completely > useless. > parser = XContentFactory.xContent(contentType).createParser(data); > parser.nextToken(); > tmp.copyCurrentStructure(parser); // This copies the whole parsed data in > this tmp builder. > // Here, by calling tmp.string(), we get the parsed data converted back to a > string. > // This means that tmp.string() == String(data)! > // All this parsing for nothing... > // And then, as the field(String, String) method is called on the builder, > and the builder being a jsonBuilder, > // the string will be escaped according to the JSON specifications. > builder.field(fieldName, tmp.string()); > {code} > If we really want to take advantage of the XContent classes, we have to add > the parsed data to the builder. To do this, it is as simply as: > {code} > parser = XContentFactory.xContent(contentType).createParser(data); > parser.nextToken(); > // Add the field name, but not the value. > builder.field(fieldName); > // This will add the whole parsed content as the value of the field. > builder.copyCurrentStructure(parser); > {code} > I tried this and it works as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)