[jira] [Commented] (THRIFT-1023) Thrift encoding (UTF-8) issue with Ruby 1.9.2

2012-06-28 Thread Nathan Beyer (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403679#comment-13403679
 ] 

Nathan Beyer commented on THRIFT-1023:
--

Is there any additional design doc or code doc for the Ruby classes? I'm trying 
to discern what APIs are working with bytes and what APIs are working with 
characters.

For example, 
[Thrift::Transport::BaseTransport|http://svn.apache.org/repos/asf/thrift/trunk/lib/rb/lib/thrift/transport/base_transport.rb]
 - are the read methods returning bytes or characters, are the writes accepting 
bytes are characters? I'm assuming it's bytes since the variables are named 
'buf', as opposed to the 'str' names used in the protocol APIs, but want to 
double-check.

> Thrift encoding  (UTF-8) issue with Ruby 1.9.2
> --
>
> Key: THRIFT-1023
> URL: https://issues.apache.org/jira/browse/THRIFT-1023
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.5
> Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
>Reporter: Vincent Peres
>Assignee: Jake Farrell
> Attachments: thrift-1023-utf8-encoding-issue.path
>
>
> I came up with an encoding issue coming from the Thrift library, and 
> especially the BufferedTransport class.
> I've decided to write down few tests to give you a concrete example :
> # encoding: utf-8
> require 'spec_helper'
> describe "encoding" do
>  before do
>transport = 
> Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
>protocol  = Thrift::BinaryProtocol.new(transport)
>@client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
>transport.open()
>@table_name = "encoding_test"
>@column_family = "info:"
>  end
>  it "should create a new table" do
>column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= 
> @column_family}
>@client.createTable(@table_name, [column]).should be_nil
>  end
>  it "should save standard caracteres" do
>m= Apache::Hadoop::Hbase::Thrift::Mutation.new
>m.column = "info:first_name"
>m.value  = "Vincent"
>m.value.encoding.should == Encoding::UTF_8
>@client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should save UTF8 caracteres" do
>m= Apache::Hadoop::Hbase::Thrift::Mutation.new
>m.column = "info:first_name"
>m.value  = "Thorbjørn"
>m.value.encoding.should == Encoding::UTF_8
>@client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should destroy the table" do
>@client.disableTable(@table_name).should be_nil
>@client.deleteTable(@table_name).should be_nil
>  end
> end
> It fails when it tries to save the UTF8 string including the caractere 'ø'.
> Here is the output :
>  1) encoding should save UTF8 caracteres
> Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
> incompatible character encodings: ASCII-8BIT and UTF-8
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
> `write'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
> `write_string'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `write'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `send_message'
> # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
> # ./lib/thrift/hbase.rb:284:in `mutateRow'
> # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in  (required)>'
> Let me know if you need any other details, thank you !

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport

2012-06-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403646#comment-13403646
 ] 

Hudson commented on THRIFT-1632:


Integrated in Thrift #508 (See [https://builds.apache.org/job/Thrift/508/])
THRIFT-1632. rb: ruby: data corruption in thrift_native implementation of 
MemoryBufferTransport

This patch fixes a subtle bug whereby the read buffer was being resized but the 
method continued to read from the original, unresized buffer but at the wrong 
location. (Revision 1355198)

 Result = SUCCESS
bryanduxbury : http://svn.apache.org/viewvc/?view=rev&rev=1355198
Files : 
* /thrift/trunk/lib/rb/ext/memory_buffer.c


> ruby: data corruption in thrift_native implementation of MemoryBufferTransport
> --
>
> Key: THRIFT-1632
> URL: https://issues.apache.org/jira/browse/THRIFT-1632
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.7, 0.8, 0.9
> Environment: Tested on Linux/Centos 6.0, with thrift_native.so 
> installed
>Reporter: Nevo Hed
>Assignee: Nevo Hed
>  Labels: newbie, patch
> Fix For: 0.9
>
> Attachments: patch, test.rb, test.thrift
>
>
> Detected a failure when serializing, then deserializing a specific object
> (I think the object needs to be large enough, AND probably must have non zero 
> data at a specific offset)
> $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb 
> Caught Thrift::ProtocolException exception: Invalid value of field x1!
> Trace:
>   ./gen-rb/test_types.rb:34:in `validate'
>   test.rb:15:in `read'
>   test.rb:15

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (THRIFT-1023) Thrift encoding (UTF-8) issue with Ruby 1.9.2

2012-06-28 Thread Nathan Beyer (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403634#comment-13403634
 ] 

Nathan Beyer commented on THRIFT-1023:
--

Any opinions on the level of Ruby 1.8 support that's needed? If $KCODE is 'U', 
this means all Strings are UTF-8 and the character data could just be treated 
as-is. This will be common, as frameworks like Rails force the interpreter into 
this mode at bootstrap. If $KCODE is 'N' (none/ASCII), 'E' (EUC) or 'S' (Shift 
JIS), then some form of transcoding would have to take place (via iconv?).

It would be fairly easy to support the Unicode/UTF-8 mode for Ruby 1.8. The 
other values would not be as easy to support. Do they need to be supported? 

Note - I'm guessing that none of the existing Thrift Ruby code actually works 
in anything other than $KCODE of 'U'; at least not for code points that are 
outside of the ASCII range (0-127).


> Thrift encoding  (UTF-8) issue with Ruby 1.9.2
> --
>
> Key: THRIFT-1023
> URL: https://issues.apache.org/jira/browse/THRIFT-1023
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.5
> Environment: OSX, Ruby 1.9.2, Thrift Gem version 0.5.0
>Reporter: Vincent Peres
>Assignee: Jake Farrell
> Attachments: thrift-1023-utf8-encoding-issue.path
>
>
> I came up with an encoding issue coming from the Thrift library, and 
> especially the BufferedTransport class.
> I've decided to write down few tests to give you a concrete example :
> # encoding: utf-8
> require 'spec_helper'
> describe "encoding" do
>  before do
>transport = 
> Thrift::BufferedTransport.new(Thrift::Socket.new(MR_CONFIG['host'], 9090))
>protocol  = Thrift::BinaryProtocol.new(transport)
>@client   = Apache::Hadoop::Hbase::Thrift::Hbase::Client.new(protocol)
>transport.open()
>@table_name = "encoding_test"
>@column_family = "info:"
>  end
>  it "should create a new table" do
>column = Apache::Hadoop::Hbase::Thrift::ColumnDescriptor.new{|c| c.name= 
> @column_family}
>@client.createTable(@table_name, [column]).should be_nil
>  end
>  it "should save standard caracteres" do
>m= Apache::Hadoop::Hbase::Thrift::Mutation.new
>m.column = "info:first_name"
>m.value  = "Vincent"
>m.value.encoding.should == Encoding::UTF_8
>@client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should save UTF8 caracteres" do
>m= Apache::Hadoop::Hbase::Thrift::Mutation.new
>m.column = "info:first_name"
>m.value  = "Thorbjørn"
>m.value.encoding.should == Encoding::UTF_8
>@client.mutateRow(@table_name, "ID1", [m]).should be_nil
>  end
>  it "should destroy the table" do
>@client.disableTable(@table_name).should be_nil
>@client.deleteTable(@table_name).should be_nil
>  end
> end
> It fails when it tries to save the UTF8 string including the caractere 'ø'.
> Here is the output :
>  1) encoding should save UTF8 caracteres
> Failure/Error: @client.mutateRow(@table_name, "ID1", [m]).should be_nil
> incompatible character encodings: ASCII-8BIT and UTF-8
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/transport/buffered_transport.rb:59:in
> `write'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:107:in
> `write_string'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `write'
> 
> #/Users/vincentp/.rvm/gems/ruby-1.9.2-p0/gems/thrift-0.5.0/lib/thrift/client.rb:35:in
> `send_message'
> # ./lib/thrift/hbase.rb:289:in `send_mutateRow'
> # ./lib/thrift/hbase.rb:284:in `mutateRow'
> # ./spec/thrift/cases/encoding_spec.rb:37:in `block (2 levels) in  (required)>'
> Let me know if you need any other details, thank you !

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport

2012-06-28 Thread Bryan Duxbury (JIRA)

 [ 
https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Duxbury closed THRIFT-1632.
-

   Resolution: Fixed
Fix Version/s: 0.9

I just committed a slightly modified version of this to TRUNK. Thanks for 
finding the bug and for drafting the original patch, Nevo!

> ruby: data corruption in thrift_native implementation of MemoryBufferTransport
> --
>
> Key: THRIFT-1632
> URL: https://issues.apache.org/jira/browse/THRIFT-1632
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.7, 0.8, 0.9
> Environment: Tested on Linux/Centos 6.0, with thrift_native.so 
> installed
>Reporter: Nevo Hed
>Assignee: Nevo Hed
>  Labels: newbie, patch
> Fix For: 0.9
>
> Attachments: patch, test.rb, test.thrift
>
>
> Detected a failure when serializing, then deserializing a specific object
> (I think the object needs to be large enough, AND probably must have non zero 
> data at a specific offset)
> $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb 
> Caught Thrift::ProtocolException exception: Invalid value of field x1!
> Trace:
>   ./gen-rb/test_types.rb:34:in `validate'
>   test.rb:15:in `read'
>   test.rb:15

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (THRIFT-1639) Java/Python: Serialization/Deserialization of double type using CompactProtocol

2012-06-28 Thread andrew watts (JIRA)
andrew watts created THRIFT-1639:


 Summary: Java/Python: Serialization/Deserialization of double type 
using CompactProtocol
 Key: THRIFT-1639
 URL: https://issues.apache.org/jira/browse/THRIFT-1639
 Project: Thrift
  Issue Type: Bug
  Components: Java - Library, Python - Library
Affects Versions: 0.8
Reporter: andrew watts


Using the Compact Protocol double values are not properly 
serialized/deserialized with Java or Python.  

I have a java server that writes a thrift blob to log file as base64 encoded 
string, then python reads the base64 encoded string from the log.  During 
development I discovered double values are not deserialized properly.

In the mailing list there is speculation of mismatch between 
serialization/deserialization in the languages and recommended opening a ticket.


Example:
{code}
» java -cp .:../lib/libthrift-0.8.0.jar:../lib/slf4j-api-1.6.4.jar ThriftTest
fooObj.bar: 3.456
base64String: F9nO91PjpQtAAA==

» python ../python/tdouble_test.py
base64 string:  F9nO91PjpQtAAA==
foo_obj.bar:  -4.09406819342e+124# expect 3.456
{code}


Thrift Definition:
{code}
struct FooObj {
1: double bar
}
{code}

Java Code:
{code}
import org.apache.thrift.TException;
import org.apache.thrift.TSerializer;
import org.apache.thrift.protocol.TCompactProtocol;

import javax.xml.bind.DatatypeConverter;


public class ThriftTest {

   public static void main(String[] args) {

   final TSerializer serializer = new TSerializer(new
TCompactProtocol.Factory());

   // create a FooObj with double
   final FooObj fooObj = new FooObj(3.456);
   System.out.println("fooObj.bar: " + fooObj.bar);

   // serialize to bytes
   byte[] fooObjBlob = null;
   try {
   fooObjBlob = serializer.serialize(fooObj);
   } catch (TException e) {
   e.printStackTrace();
   }

   // encode to base64 string
   final String base64String =
DatatypeConverter.printBase64Binary(fooObjBlob);
   System.out.println("base64String: " + base64String);
   }

}
{code}


Python Code

{code}
#!/bin/env python

import base64

from thrift.protocol import TCompactProtocol
from thrift.TSerialization import deserialize

from foo.ttypes import FooObj


def main():

   protocol_factory = TCompactProtocol.TCompactProtocolFactory
   base64_string = 'F9nO91PjpQtAAA=='

   print 'base64 string: ', base64_string

   # deserialize the string back into an object
   foo_blob = base64.urlsafe_b64decode(base64_string)
   foo_obj = FooObj()
   deserialize(foo_obj, foo_blob, protocol_factory=protocol_factory())
   print 'foo_obj.bar: ', foo_obj.bar

if __name__ == '__main__':
   main()

{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport

2012-06-28 Thread Nevo Hed (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403298#comment-13403298
 ] 

Nevo Hed commented on THRIFT-1632:
--

Bryan,

Thanks for looking at this.  I definitely approached this from an outsider 
perspective looking at the two versions (C-ruby-lib vs ruby module) and 
attempted to make the broken one that does not seem to obliterate the data.

So my question is - is my change functionaly different than what is already in 
the ruby read_into_buffer() method? 
[lib/rb/lib/thrift/transport/memory_buffer_transport.rb]



> ruby: data corruption in thrift_native implementation of MemoryBufferTransport
> --
>
> Key: THRIFT-1632
> URL: https://issues.apache.org/jira/browse/THRIFT-1632
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.7, 0.8, 0.9
> Environment: Tested on Linux/Centos 6.0, with thrift_native.so 
> installed
>Reporter: Nevo Hed
>Assignee: Nevo Hed
>  Labels: newbie, patch
> Attachments: patch, test.rb, test.thrift
>
>
> Detected a failure when serializing, then deserializing a specific object
> (I think the object needs to be large enough, AND probably must have non zero 
> data at a specific offset)
> $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb 
> Caught Thrift::ProtocolException exception: Invalid value of field x1!
> Trace:
>   ./gen-rb/test_types.rb:34:in `validate'
>   test.rb:15:in `read'
>   test.rb:15

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (THRIFT-1637) NPM registry does not include version 0.8

2012-06-28 Thread Dan Cromer (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403260#comment-13403260
 ] 

Dan Cromer commented on THRIFT-1637:


I, too, have hit this issue.



> NPM registry does not include version 0.8
> -
>
> Key: THRIFT-1637
> URL: https://issues.apache.org/jira/browse/THRIFT-1637
> Project: Thrift
>  Issue Type: Bug
>  Components: Node.js - Library
>Affects Versions: 0.8
>Reporter: B2M
>
> Version 0.8 of node now errors if the sys module is used. This issue was 
> fixed in version 0.8 of Thrift but only 0.7 is available in npm.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (THRIFT-1632) ruby: data corruption in thrift_native implementation of MemoryBufferTransport

2012-06-28 Thread Bryan Duxbury (JIRA)

[ 
https://issues.apache.org/jira/browse/THRIFT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403216#comment-13403216
 ] 

Bryan Duxbury commented on THRIFT-1632:
---

After staring at this for a bit, I think I've figured out the bug in this 
function and why your patch solves the issue.

The way that this is supposed to work is that when we've consumed enough of the 
memory buffer, we reallocate a new one without the used-up space in the front. 
This is intended to save memory. However, the issue is that when we decide to 
resize the buffer, we don't reassign the "buf" variable to the new buffer - 
we've still got a pointer to the old one. But we reset the index pointer as 
though we have switched, which means that when you start reading again, you'll 
be getting the wrong data.

Your patch fixes this issue by only doing a garbage resize once after all the 
reading is done, thus guaranteeing that the buffer pointer and its index remain 
valid throughout the lifetime of the method. 

> ruby: data corruption in thrift_native implementation of MemoryBufferTransport
> --
>
> Key: THRIFT-1632
> URL: https://issues.apache.org/jira/browse/THRIFT-1632
> Project: Thrift
>  Issue Type: Bug
>  Components: Ruby - Library
>Affects Versions: 0.7, 0.8, 0.9
> Environment: Tested on Linux/Centos 6.0, with thrift_native.so 
> installed
>Reporter: Nevo Hed
>Assignee: Nevo Hed
>  Labels: newbie, patch
> Attachments: patch, test.rb, test.thrift
>
>
> Detected a failure when serializing, then deserializing a specific object
> (I think the object needs to be large enough, AND probably must have non zero 
> data at a specific offset)
> $ /usr/bin/thrift --gen rb test.thrift && ruby test.rb 
> Caught Thrift::ProtocolException exception: Invalid value of field x1!
> Trace:
>   ./gen-rb/test_types.rb:34:in `validate'
>   test.rb:15:in `read'
>   test.rb:15

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (THRIFT-1638) TSocket constructor with socket will throw NPE on open

2012-06-28 Thread Jim Kerwood (JIRA)
Jim Kerwood created THRIFT-1638:
---

 Summary: TSocket constructor with socket will throw NPE on open
 Key: THRIFT-1638
 URL: https://issues.apache.org/jira/browse/THRIFT-1638
 Project: Thrift
  Issue Type: Bug
  Components: Java - Library
Affects Versions: 0.8, 0.9, 1.0, 1.1
Reporter: Jim Kerwood
Priority: Minor


When using the constructor of TSocket(Socket s) the open() method will throw an 
NPE checking host_.length()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (THRIFT-1637) NPM registry does not include version 0.8

2012-06-28 Thread B2M (JIRA)
B2M created THRIFT-1637:
---

 Summary: NPM registry does not include version 0.8
 Key: THRIFT-1637
 URL: https://issues.apache.org/jira/browse/THRIFT-1637
 Project: Thrift
  Issue Type: Bug
  Components: Node.js - Library
Affects Versions: 0.8
Reporter: B2M


Version 0.8 of node now errors if the sys module is used. This issue was fixed 
in version 0.8 of Thrift but only 0.7 is available in npm.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira