[jira] [Commented] (FLUME-2649) Elasticsearch sink doesn't handle JSON fields correctly

Hari Shreedharan (JIRA) Mon, 20 Apr 2015 23:11:14 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504410#comment-14504410
 ]


Hari Shreedharan commented on FLUME-2649:
-----------------------------------------

+1. Committing. Thanks [~bfiorini] for the patch and [~ejsarge] for the review!

> Elasticsearch sink doesn't handle JSON fields correctly
> -------------------------------------------------------
>
>                 Key: FLUME-2649
>                 URL: https://issues.apache.org/jira/browse/FLUME-2649
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>            Reporter: Francis
>            Assignee: Benjamin Fiorini
>         Attachments: FLUME-2649-0.patch, FLUME-2649-1.patch, 
> FLUME-2649-2.patch, FLUME-2649-3.patch, FLUME-2649-4.patch, 
> FLUME-2649-5.patch, FLUME-2649-6.patch
>
>
> JSON attributes are treated like normal strings and are escaped by the sink. 
> For example, if the body or a header contains the following value:
> {code:javascript}
> {"foo":"bar"}
> {code}
> It will be added like this in Elasticsearch:
> {code:javascript}
> {"@message": "{\"foo\":\"bar\"}}"
> {code}
> We end up with a plain string instead of a valid JSON field.
> I think I found how to fix this bug. The source of the problem is caused by 
> the way a "complex field" is added. The ES XContent classes are used to parse 
> the data in the detected format, but then, instead of adding the parsed data, 
> the string() method is called and it converts it back to a string that is the 
> same as the initial data! Here is the current code with added comments:
> {code}
> XContentBuilder tmp = jsonBuilder(); // This tmp builder is completely 
> useless.
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> tmp.copyCurrentStructure(parser); // This copies the whole parsed data in 
> this tmp builder.
> // Here, by calling tmp.string(), we get the parsed data converted back to a 
> string.
> // This means that tmp.string() == String(data)!
> // All this parsing for nothing...
> // And then, as the field(String, String) method is called on the builder, 
> and the builder being a jsonBuilder,
> // the string will be escaped according to the JSON specifications. 
> builder.field(fieldName, tmp.string());
> {code}
> If we really want to take advantage of the XContent classes, we have to add 
> the parsed data to the builder. To do this, it is as simply as:
> {code}
> parser = XContentFactory.xContent(contentType).createParser(data);
> parser.nextToken();
> // Add the field name, but not the value.
> builder.field(fieldName);
> // This will add the whole parsed content as the value of the field.
> builder.copyCurrentStructure(parser);
> {code}
> I tried this and it works as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLUME-2649) Elasticsearch sink doesn't handle JSON fields correctly

Reply via email to