[ 
https://issues.apache.org/jira/browse/AVRO-554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875764#action_12875764
 ] 

Jeff Hammerbacher commented on AVRO-554:
----------------------------------------

Hey,

Interesting problem here. It turns out that calling buffer.truncate(0) on 
StringIO buffer in Python will both clear the contents of the buffer and reset 
the position to 0. For a file buffer, however, you need to explicitly call 
buffer.reset(0) after buffer.truncate. I think Ruby's behavior is actually more 
reasonable. For those who'd like to follow along at home, I've opened a 
question on Quora to discover the source of this inconsistency in the Python 
buffer API: 
http://www.quora.com/Why-does-the-behavior-of-the-truncate-method-on-a-StringIO-object-in-Python-differ-from-the-truncate-method-on-a-file.

Later,
Jeff

> data files created by ruby DataWriter are extremely large
> ---------------------------------------------------------
>
>                 Key: AVRO-554
>                 URL: https://issues.apache.org/jira/browse/AVRO-554
>             Project: Avro
>          Issue Type: Bug
>          Components: ruby
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: avro-1.4.0-pre1, ruby 1.8.7 (2010-01-10 patchlevel 249) 
> [x86_64-linux]
>            Reporter: Grant Rodgers
>            Assignee: Grant Rodgers
>             Fix For: 1.3.3
>
>         Attachments: AVRO-554-2.patch, AVRO-554.patch, avro_comp.rb, 
> data10.avr, data100.avr, data3000.avr.gz, patched-data3000.avr
>
>
> Adding 10000 records of a very simple schema (3 fields) to a DataWriter 
> results in a file that is 317mb.  The same records in JSON are 430k.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to