[ https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218975#comment-15218975 ]
Steven Phillips commented on DRILL-4558: ---------------------------------------- This looks like a problem in the BsonRecordReader: {code} private void writeString(String readString, final MapOrListWriterImpl writer, String fieldName, boolean isList) { final int length = readString.length(); final VarCharHolder vh = new VarCharHolder(); ensure(length); try { workBuf.setBytes(0, readString.getBytes("UTF-8")); } catch (UnsupportedEncodingException e) { throw new DrillRuntimeException("Unable to read string value for field: " + fieldName, e); } vh.buffer = workBuf; vh.start = 0; vh.end = length; if (isList == false) { writer.varChar(fieldName).write(vh); } else { writer.list.varChar().write(vh); } } {code} the length variable should be the length of the byte array, not the length of the String. A quick work-around would be to disable the bson reader: set store.mongo.bson.record.reader = false; > When a query returns diacritics in a string, the string is cut > -------------------------------------------------------------- > > Key: DRILL-4558 > URL: https://issues.apache.org/jira/browse/DRILL-4558 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB > Environment: Apache Drill 1.6 > MongoDB 3.2.1 > Reporter: Vincent Uribe > > With the given document in a collection "Test" from a database testDb : > { > "_id" : ObjectId("56e7f1bd0944228aab06d0e2"), > "ID_ATTRIBUT" : "3", > "VAL_ATTRIBUT" : "Végétaux", > "UPDATED" : ISODate("2016-01-09T23:00:00.000Z") > } > When querying select * from mongoStorage.testDb.Test I get > _id: [B@affb65 > ID_ATTRIBUT: 3 > VAL_ATTRIBUT: *Végéta* > UPDATED: 2016-01-09T23:00:00.000Z > As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving > végéta. -- This message was sent by Atlassian JIRA (v6.3.4#6332)