[jira] [Commented] (THRIFT-1023) Thrift encoding (UTF-8) issue with Ruby 1.9.2
[ https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403679#comment-13403679 ] Nathan Beyer commented on THRIFT-1023: -- Is there any additional design doc or code doc for the Ruby classes? I'm trying to discern what APIs are working with bytes and what APIs are working with characters. For example, [Thrift::Transport::BaseTransport|http://svn.apache.org/repos/asf/thrift/trunk/lib/rb/lib/thrift/transport/base_transport.rb] - are the read methods returning bytes or characters, are the writes accepting bytes are characters? I'm assuming it's bytes since the variables are named 'buf', as opposed to the 'str' names used in the protocol APIs, but want to double-check. > Thrift encoding (UTF-8) issue with Ruby 1.9.2 > -- > > Key: THRIFT-1023 > URL: https://issues.apache.org/jira/browse/THRIFT-1023 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.5 > Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0 >Reporter: Vincent Peres >Assignee: Jake Farrell > Attachments: thrift-1023-utf8-encoding-issue.path > > > I came up with an encoding issue coming from the Thrift library, and > especially the BufferedTransport class. > I've decided to write down few tests to give you a concrete example : > # encoding: utf-8 > require 'spec_helper' > describe "encoding" do > before do >transport = > Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090)) >protocol = Thrift::BinaryProtocol.new(transport) >@client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol) >transport.open() >@table_name = "encoding_test" >@column_family = "info:" > end > it "should create a new table" do >column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= > @column_family} >@client.createTable(@table_name, [column]).should be_nil > end > it "should save standard caracteres" do >m= Apache::Hadoop::Hbase::Thrift::Mutation.new >m.column = "info:first_name" >m.value = "Vincent" >m.value.encoding.should == Encoding::UTF_8 >@client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should save UTF8 caracteres" do >m= Apache::Hadoop::Hbase::Thrift::Mutation.new >m.column = "info:first_name" >m.value = "Thorbjørn" >m.value.encoding.should == Encoding::UTF_8 >@client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should destroy the table" do >@client.disableTable(@table_name).should be_nil >@client.deleteTable(@table_name).should be_nil > end > end > It fails when it tries to save the UTF8 string including the caractere 'ø'. > Here is the output : > 1) encoding should save UTF8 caracteres > Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil > incompatible character encodings: ASCII-8BIT and UTF-8 > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in > `write_string' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `send_message' > # ./lib/thrift/hbase.rb:289:in `send_mutateRow' > # ./lib/thrift/hbase.rb:284:in `mutateRow' > # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in (required)>' > Let me know if you need any other details, thank you ! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport
[ https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403646#comment-13403646 ] Hudson commented on THRIFT-1632: Integrated in Thrift #508 (See [https://builds.apache.org/job/Thrift/508/]) THRIFT-1632. rb: ruby: data corruption in thrift_native implementation of MemoryBufferTransport This patch fixes a subtle bug whereby the read buffer was being resized but the method continued to read from the original, unresized buffer but at the wrong location. (Revision 1355198) Result = SUCCESS bryanduxbury : http://svn.apache.org/viewvc/?view=rev&rev=1355198 Files : * /thrift/trunk/lib/rb/ext/memory_buffer.c > ruby: data corruption in thrift_native implementation of MemoryBufferTransport > -- > > Key: THRIFT-1632 > URL: https://issues.apache.org/jira/browse/THRIFT-1632 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.7, 0.8, 0.9 > Environment: Tested on Linux/Centos 6.0, with thrift_native.so > installed >Reporter: Nevo Hed >Assignee: Nevo Hed > Labels: newbie, patch > Fix For: 0.9 > > Attachments: patch, test.rb, test.thrift > > > Detected a failure when serializing, then deserializing a specific object > (I think the object needs to be large enough, AND probably must have non zero > data at a specific offset) > $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb > Caught Thrift::ProtocolException exception: Invalid value of field x1! > Trace: > ./gen-rb/test_types.rb:34:in `validate' > test.rb:15:in `read' > test.rb:15 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (THRIFT-1023) Thrift encoding (UTF-8) issue with Ruby 1.9.2
[ https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403634#comment-13403634 ] Nathan Beyer commented on THRIFT-1023: -- Any opinions on the level of Ruby 1.8 support that's needed? If $KCODE is 'U', this means all Strings are UTF-8 and the character data could just be treated as-is. This will be common, as frameworks like Rails force the interpreter into this mode at bootstrap. If $KCODE is 'N' (none/ASCII), 'E' (EUC) or 'S' (Shift JIS), then some form of transcoding would have to take place (via iconv?). It would be fairly easy to support the Unicode/UTF-8 mode for Ruby 1.8. The other values would not be as easy to support. Do they need to be supported? Note - I'm guessing that none of the existing Thrift Ruby code actually works in anything other than $KCODE of 'U'; at least not for code points that are outside of the ASCII range (0-127). > Thrift encoding (UTF-8) issue with Ruby 1.9.2 > -- > > Key: THRIFT-1023 > URL: https://issues.apache.org/jira/browse/THRIFT-1023 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.5 > Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0 >Reporter: Vincent Peres >Assignee: Jake Farrell > Attachments: thrift-1023-utf8-encoding-issue.path > > > I came up with an encoding issue coming from the Thrift library, and > especially the BufferedTransport class. > I've decided to write down few tests to give you a concrete example : > # encoding: utf-8 > require 'spec_helper' > describe "encoding" do > before do >transport = > Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090)) >protocol = Thrift::BinaryProtocol.new(transport) >@client = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol) >transport.open() >@table_name = "encoding_test" >@column_family = "info:" > end > it "should create a new table" do >column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= > @column_family} >@client.createTable(@table_name, [column]).should be_nil > end > it "should save standard caracteres" do >m= Apache::Hadoop::Hbase::Thrift::Mutation.new >m.column = "info:first_name" >m.value = "Vincent" >m.value.encoding.should == Encoding::UTF_8 >@client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should save UTF8 caracteres" do >m= Apache::Hadoop::Hbase::Thrift::Mutation.new >m.column = "info:first_name" >m.value = "Thorbjørn" >m.value.encoding.should == Encoding::UTF_8 >@client.mutateRow(@table_name, "ID1", [m]).should be_nil > end > it "should destroy the table" do >@client.disableTable(@table_name).should be_nil >@client.deleteTable(@table_name).should be_nil > end > end > It fails when it tries to save the UTF8 string including the caractere 'ø'. > Here is the output : > 1) encoding should save UTF8 caracteres > Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil > incompatible character encodings: ASCII-8BIT and UTF-8 > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in > `write_string' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `write' > > #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in > `send_message' > # ./lib/thrift/hbase.rb:289:in `send_mutateRow' > # ./lib/thrift/hbase.rb:284:in `mutateRow' > # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in (required)>' > Let me know if you need any other details, thank you ! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport
[ https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury closed THRIFT-1632. - Resolution: Fixed Fix Version/s: 0.9 I just committed a slightly modified version of this to TRUNK. Thanks for finding the bug and for drafting the original patch, Nevo! > ruby: data corruption in thrift_native implementation of MemoryBufferTransport > -- > > Key: THRIFT-1632 > URL: https://issues.apache.org/jira/browse/THRIFT-1632 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.7, 0.8, 0.9 > Environment: Tested on Linux/Centos 6.0, with thrift_native.so > installed >Reporter: Nevo Hed >Assignee: Nevo Hed > Labels: newbie, patch > Fix For: 0.9 > > Attachments: patch, test.rb, test.thrift > > > Detected a failure when serializing, then deserializing a specific object > (I think the object needs to be large enough, AND probably must have non zero > data at a specific offset) > $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb > Caught Thrift::ProtocolException exception: Invalid value of field x1! > Trace: > ./gen-rb/test_types.rb:34:in `validate' > test.rb:15:in `read' > test.rb:15 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (THRIFT-1639) Java/Python: Serialization/Deserialization of double type using CompactProtocol
andrew watts created THRIFT-1639: Summary: Java/Python: Serialization/Deserialization of double type using CompactProtocol Key: THRIFT-1639 URL: https://issues.apache.org/jira/browse/THRIFT-1639 Project: Thrift Issue Type: Bug Components: Java - Library, Python - Library Affects Versions: 0.8 Reporter: andrew watts Using the Compact Protocol double values are not properly serialized/deserialized with Java or Python. I have a java server that writes a thrift blob to log file as base64 encoded string, then python reads the base64 encoded string from the log. During development I discovered double values are not deserialized properly. In the mailing list there is speculation of mismatch between serialization/deserialization in the languages and recommended opening a ticket. Example: {code} » java -cp .:../lib/libthrift-0.8.0.jar:../lib/slf4j-api-1.6.4.jar ThriftTest fooObj.bar: 3.456 base64String: F9nO91PjpQtAAA== » python ../python/tdouble_test.py base64 string: F9nO91PjpQtAAA== foo_obj.bar: -4.09406819342e+124# expect 3.456 {code} Thrift Definition: {code} struct FooObj { 1: double bar } {code} Java Code: {code} import org.apache.thrift.TException; import org.apache.thrift.TSerializer; import org.apache.thrift.protocol.TCompactProtocol; import javax.xml.bind.DatatypeConverter; public class ThriftTest { public static void main(String[] args) { final TSerializer serializer = new TSerializer(new TCompactProtocol.Factory()); // create a FooObj with double final FooObj fooObj = new FooObj(3.456); System.out.println("fooObj.bar: " + fooObj.bar); // serialize to bytes byte[] fooObjBlob = null; try { fooObjBlob = serializer.serialize(fooObj); } catch (TException e) { e.printStackTrace(); } // encode to base64 string final String base64String = DatatypeConverter.printBase64Binary(fooObjBlob); System.out.println("base64String: " + base64String); } } {code} Python Code {code} #!/bin/env python import base64 from thrift.protocol import TCompactProtocol from thrift.TSerialization import deserialize from foo.ttypes import FooObj def main(): protocol_factory = TCompactProtocol.TCompactProtocolFactory base64_string = 'F9nO91PjpQtAAA==' print 'base64 string: ', base64_string # deserialize the string back into an object foo_blob = base64.urlsafe_b64decode(base64_string) foo_obj = FooObj() deserialize(foo_obj, foo_blob, protocol_factory=protocol_factory()) print 'foo_obj.bar: ', foo_obj.bar if __name__ == '__main__': main() {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport
[ https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403298#comment-13403298 ] Nevo Hed commented on THRIFT-1632: -- Bryan, Thanks for looking at this. I definitely approached this from an outsider perspective looking at the two versions (C-ruby-lib vs ruby module) and attempted to make the broken one that does not seem to obliterate the data. So my question is - is my change functionaly different than what is already in the ruby read_into_buffer() method? [lib/rb/lib/thrift/transport/memory_buffer_transport.rb] > ruby: data corruption in thrift_native implementation of MemoryBufferTransport > -- > > Key: THRIFT-1632 > URL: https://issues.apache.org/jira/browse/THRIFT-1632 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.7, 0.8, 0.9 > Environment: Tested on Linux/Centos 6.0, with thrift_native.so > installed >Reporter: Nevo Hed >Assignee: Nevo Hed > Labels: newbie, patch > Attachments: patch, test.rb, test.thrift > > > Detected a failure when serializing, then deserializing a specific object > (I think the object needs to be large enough, AND probably must have non zero > data at a specific offset) > $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb > Caught Thrift::ProtocolException exception: Invalid value of field x1! > Trace: > ./gen-rb/test_types.rb:34:in `validate' > test.rb:15:in `read' > test.rb:15 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (THRIFT-1637) NPM registry does not include version 0.8
[ https://issues.apache.org/jira/browse/THRIFT-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403260#comment-13403260 ] Dan Cromer commented on THRIFT-1637: I, too, have hit this issue. > NPM registry does not include version 0.8 > - > > Key: THRIFT-1637 > URL: https://issues.apache.org/jira/browse/THRIFT-1637 > Project: Thrift > Issue Type: Bug > Components: Node.js - Library >Affects Versions: 0.8 >Reporter: B2M > > Version 0.8 of node now errors if the sys module is used. This issue was > fixed in version 0.8 of Thrift but only 0.7 is available in npm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport
[ https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403216#comment-13403216 ] Bryan Duxbury commented on THRIFT-1632: --- After staring at this for a bit, I think I've figured out the bug in this function and why your patch solves the issue. The way that this is supposed to work is that when we've consumed enough of the memory buffer, we reallocate a new one without the used-up space in the front. This is intended to save memory. However, the issue is that when we decide to resize the buffer, we don't reassign the "buf" variable to the new buffer - we've still got a pointer to the old one. But we reset the index pointer as though we have switched, which means that when you start reading again, you'll be getting the wrong data. Your patch fixes this issue by only doing a garbage resize once after all the reading is done, thus guaranteeing that the buffer pointer and its index remain valid throughout the lifetime of the method. > ruby: data corruption in thrift_native implementation of MemoryBufferTransport > -- > > Key: THRIFT-1632 > URL: https://issues.apache.org/jira/browse/THRIFT-1632 > Project: Thrift > Issue Type: Bug > Components: Ruby - Library >Affects Versions: 0.7, 0.8, 0.9 > Environment: Tested on Linux/Centos 6.0, with thrift_native.so > installed >Reporter: Nevo Hed >Assignee: Nevo Hed > Labels: newbie, patch > Attachments: patch, test.rb, test.thrift > > > Detected a failure when serializing, then deserializing a specific object > (I think the object needs to be large enough, AND probably must have non zero > data at a specific offset) > $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb > Caught Thrift::ProtocolException exception: Invalid value of field x1! > Trace: > ./gen-rb/test_types.rb:34:in `validate' > test.rb:15:in `read' > test.rb:15 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (THRIFT-1638) TSocket constructor with socket will throw NPE on open
Jim Kerwood created THRIFT-1638: --- Summary: TSocket constructor with socket will throw NPE on open Key: THRIFT-1638 URL: https://issues.apache.org/jira/browse/THRIFT-1638 Project: Thrift Issue Type: Bug Components: Java - Library Affects Versions: 0.8, 0.9, 1.0, 1.1 Reporter: Jim Kerwood Priority: Minor When using the constructor of TSocket(Socket s) the open() method will throw an NPE checking host_.length() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (THRIFT-1637) NPM registry does not include version 0.8
B2M created THRIFT-1637: --- Summary: NPM registry does not include version 0.8 Key: THRIFT-1637 URL: https://issues.apache.org/jira/browse/THRIFT-1637 Project: Thrift Issue Type: Bug Components: Node.js - Library Affects Versions: 0.8 Reporter: B2M Version 0.8 of node now errors if the sys module is used. This issue was fixed in version 0.8 of Thrift but only 0.7 is available in npm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira