[ 
https://issues.apache.org/jira/browse/FLUME-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320192#comment-14320192
 ] 

Francis commented on FLUME-2126:
--------------------------------

I think I found how to fix this bug once for all. The source of the problem is 
caused by the way a "complex field" is added. The ES XContent classes are used 
to parse the data in the detected format, but then, instead of adding the 
parsed data, the string() method is called and it converts it back to a string 
that is the same as the initial data! Here is the current code with added 
comments:
{code}
XContentBuilder tmp = jsonBuilder(); // This tmp builder is completely useless.
parser = XContentFactory.xContent(contentType).createParser(data);
parser.nextToken();
tmp.copyCurrentStructure(parser); // This copies the whole parsed data in this 
tmp builder.
// Here, by calling tmp.string(), we get the parsed data converted back to a 
string.
// This means that tmp.string() == String(data)!
// All this parsing for nothing...
// And then, as the field(String, String) method is called on the builder, and 
the builder being a jsonBuilder,
// the string will be escaped according to the JSON specifications. 
builder.field(fieldName, tmp.string());
{code}
If we really want to take advantage of the XContent classes, we have to add the 
parsed data to the builder. To do this, it is as simply as:
{code}
parser = XContentFactory.xContent(contentType).createParser(data);
parser.nextToken();
// Add the field name, but not the value.
builder.field(fieldName);
// This will add the whole parsed content as the value of the field.
builder.copyCurrentStructure(parser);
{code}
I tried this and it works as expected. This is almost the same as the initial 
workaround posted in this bug description. I don't understand why it hasn't 
been used as a starting point to fix this bug.

> Problem in elasticsearch sink when the event body is a complex field
> --------------------------------------------------------------------
>
>                 Key: FLUME-2126
>                 URL: https://issues.apache.org/jira/browse/FLUME-2126
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>         Environment: 1.3.1 and 1.4
>            Reporter: Massimo Paladin
>            Assignee: Ashish Paliwal
>         Attachments: FLUME-2126-0.patch
>
>
> I have found a bug in the elasticsearch sink, the problem is in the 
> {{ContentBuilderUtil.addComplexField}} method, when it does 
> {{builder.field(fieldName, tmp);}} the {{tmp}} object is taken as {{Object}} 
> with the result of being serialized with the {{toString}} method in the 
> {{XContentBuilder}}. In the end you get the object reference as content.
> The following change workaround the problem for me, the bad point is that it 
> has to parse the content twice, I guess there is a better way to solve the 
> problem but I am not an elasticsearch api expert. 
> {code}
> --- 
> a/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> +++ 
> b/flume-ng-sinks/flume-ng-elasticsearch-sink/src/main/java/org/apache/flume/sink/elasticsearch/ContentBuilderUtil.java
> @@ -61,7 +61,12 @@ public class ContentBuilderUtil {
>        parser = XContentFactory.xContent(contentType).createParser(data);
>        parser.nextToken();
>        tmp.copyCurrentStructure(parser);
> -      builder.field(fieldName, tmp);
> +
> +      // if it is a valid structure then we include it
> +      parser = XContentFactory.xContent(contentType).createParser(data);
> +      parser.nextToken();
> +      builder.field(fieldName);
> +      builder.copyCurrentStructure(parser);
>      } catch (JsonParseException ex) {
>        // If we get an exception here the most likely cause is nested JSON 
> that
>        // can't be figured out in the body. At this point just push it through
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to