[ https://issues.apache.org/jira/browse/THRIFT-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560251#comment-13560251 ]
Nabeel Shahzad edited comment on THRIFT-1841 at 1/23/13 12:49 AM: ------------------------------------------------------------------ So the issue is this, in lib/thrift/transport.js: {noformat} readString: function(len) { var str = this.inBuf.toString('utf8', this.readPos, this.readPos + len); this.readPos += len; return str; } {noformat} It seems that readString() is being called for ALL types - I haven't had the time to see exactly why, but from my cursory looks from seeing where the issue is, it seems the bit that is set for the field type is always set to string. Not sure if this is an error in the Thrift file, or something else... I've had to move on from this bug, since I'm doing a workaround... unfortunately, I'm several days behind schedule. I've worked around it for myself, by doing: {noformat} var str = this.inBuf.slice(this.readPos, this.readPos + len); {noformat} Then I get the byte-object (Buffer) back, and then parse the fields according to their type. Hopefully this helps someone else, and can narrow down the issue. In my "spare-time", I will try to find the exact issue, but hopefully that puts whoever is looking at this on the right path. I updated this bug title to reflect the fact that it affects all non-string types. was (Author: nabeel): So the issue is this, in lib/thrift/transport.js: {noformat} readString: function(len) { var str = this.inBuf.toString('utf8', this.readPos, this.readPos + len); this.readPos += len; return str; } {noformat} It seems that readString() is being called for ALL types - I haven't had the time to see exactly why, but from my cursory looks from seeing where the issue is, it seems the bit that is set for the field type is always set to string. Not sure if this is an error in the Thrift file, or something else... I've had to move on from this bug, since I'm doing a workaround... unfortunately, I'm several days behind schedule. I've worked around it for myself, by doing: {noformat} var str = this.inBuf.slice(this.readPos, this.readPos + len); {noformat} Then I get the byte-object (Buffer) back, and then parse the fields according to their type. Hopefully this helps someone else, and can narrow down the issue. In my "spare-time", I will try to find the exact issue, but hopefully that puts whoever is looking at this on the right path. > NodeJS Thrift incorrectly parses non-UTF8-string types > ------------------------------------------------------ > > Key: THRIFT-1841 > URL: https://issues.apache.org/jira/browse/THRIFT-1841 > Project: Thrift > Issue Type: Bug > Components: Node.js - Compiler, Node.js - Library > Affects Versions: 0.9, 1.0 > Reporter: Nabeel Shahzad > > When a double/float is used in a map (key or value), list, or set types, the > decoding is done as a utf8 string, which then incorrectly parses and adds > extra bytes. > For example: > The bytes of a map <double, double> (this is coming out of the Thrift call) > {noformat} > 00 01 00 08 3f f4 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00 > {noformat} > But after it's been parsed out from the field as UTF8: > {noformat} > 00 01 00 08 3f 3f 00 00 00 00 00 00 00 08 40 02 00 00 00 00 00 00 > {noformat} > As you can see there's an incorrect byte (the 3f where the f4, and an extra > 00). For reference, this value was map<double, double> = {1.25: 2.25}. This > is the same behavior for floats. The f4 translated to ASCII 247, which I > believe isn't a valid utf8 code. > The actual value of the field becomes: > {noformat} > value: > '\u0000\u0002\u0000\b??\u0000\u0000\u0000\u0000\u0000\u0000\u0000\b@\u0002\u0000\u0000\u0000\u0000\u0000\u0000'' > {noformat} > Where the \b = 8, ? = f4, ? = unknown char. > I have seen cases where there are *extra* bytes added in, which breaks the > parsing based on byte size: > {noformat} > 00 01 00 08 40 24 48 72 c2 b0 20 c3 84 c2 9c 00 08 40 34 c3 bc c3 93 5a c2 85 > c2 87 c2 94 > {noformat} > Where the MAP value was {10.1415, 20.9876}. On a list or set, using either > value also yields extra bytes. > So this messes up any parsing based on the byte-length for the field, since > there are a variable number of extra bytes added, either to the key or value > of the map, and any values of a list. I believe this could also happen on > high-integer values. > It seems to me when the "ftype" is parsed (int16) before the actual field, > it's returning a TYPE value of "11" (string) - instead of the proper value of > a map/set/list. > For reference, the table, and an insert example: > {noformat} > CREATE TABLE sample_map ( > id text PRIMARY KEY, > map_col_text map < text, text >, > map_col_int map < int, text >, > map_col_float map < float, float >, > map_col_double map < double, double > > ); > INSERT INTO sample_map (id, map_col_double) VALUES('DOUBLE_ROW_SINGLE', > {10.1415: 20.9876}); > {noformat} > Not sure if it matters, but this was using CQL3. Also, we are not seeing this > on the C++ generated Thrift interface. > Versions: > {noformat} > cqlsh:orion> show version; > [cqlsh 2.3.0 | Cassandra 1.2.0 | CQL spec 3.0.0 | Thrift protocol 19.35.0] > {noformat} > {noformat} > $ thrift --version > Thrift version 0.9.0 > {noformat} > {noformat} > "name": "node-thrift", > "description": "node.js bindings for the Apache Thrift RPC system", > "homepage": "http://thrift.apache.org/", > "repository": { > "type": "svn", > "url": "http://svn.apache.org/repos/asf/thrift/trunk/" > }, > "version": "1.0.0-dev", > {noformat} > The issue also appears in the 0.9.0 version of the thrift library. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira