[ https://issues.apache.org/jira/browse/AVRO-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Cooper updated AVRO-892: -------------------------------- Description: The Python library for avro fails to write some blocks when used with snappy compression. The error is: {code} Traceback (most recent call last): File "tools/json_to_avro.py", line 74, in <module> writer.append(line) File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", line 185, in append self._write_block() File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", line 169, in _write_block self.encoder.write_crc32(uncompressed_data) File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/io.py", line 364, in write_crc32 self.write(STRUCT_CRC32.pack(crc32(bytes))); struct.error: integer out of range for 'I' format code {code} >From my investigation, str(crc32(bytes)) is showing negative integers, so the >issue seems to be fixed by masking the output. This fix appears to work from my limited testing: {code} --- io.old.py 2011-09-21 14:32:38.992544680 +1000 +++ io.py 2011-09-21 14:33:11.492544686 +1000 @@ -360,7 +360,7 @@ """ A 4-byte, big-endian CRC32 checksum """ - self.write(STRUCT_CRC32.pack(crc32(bytes))); + self.write(STRUCT_CRC32.pack(crc32(bytes) & 0xffffffff)); # # DatumReader/Writer {code} was: The Python library for avro fails to write some blocks when used with snappy compression. The error is: Traceback (most recent call last): File "tools/json_to_avro.py", line 74, in <module> writer.append(line) File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", line 185, in append self._write_block() File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", line 169, in _write_block self.encoder.write_crc32(uncompressed_data) File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/io.py", line 364, in write_crc32 self.write(STRUCT_CRC32.pack(crc32(bytes))); struct.error: integer out of range for 'I' format code >From my investigation, str(crc32(bytes)) is showing negative integers, so the >issue seems to be fixed by masking the output. This fix appears to work from my limited testing: --- io.old.py 2011-09-21 14:32:38.992544680 +1000 +++ io.py 2011-09-21 14:33:11.492544686 +1000 @@ -360,7 +360,7 @@ """ A 4-byte, big-endian CRC32 checksum """ - self.write(STRUCT_CRC32.pack(crc32(bytes))); + self.write(STRUCT_CRC32.pack(crc32(bytes) & 0xffffffff)); # # DatumReader/Writer > Python snappy error: "integer out of range for 'I' format code" > --------------------------------------------------------------- > > Key: AVRO-892 > URL: https://issues.apache.org/jira/browse/AVRO-892 > Project: Avro > Issue Type: Bug > Components: python > Affects Versions: 1.5.4 > Environment: Linux michaelc 2.6.38-11-generic #48-Ubuntu SMP Fri Jul > 29 19:02:55 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux > Ubuntu 11.04 > Python 2.7.1+ (ubuntu stock version) > avro-1.5.4-py2.7.egg > snappy-1.0.4 (c library) > python-snappy-0.3.2 > Reporter: Michael Cooper > > The Python library for avro fails to write some blocks when used with snappy > compression. > The error is: > {code} > Traceback (most recent call last): > File "tools/json_to_avro.py", line 74, in <module> > writer.append(line) > File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", > line 185, in append > self._write_block() > File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/datafile.py", > line 169, in _write_block > self.encoder.write_crc32(uncompressed_data) > File "/home/michaelc/.python/2.7/avro-1.5.4-py2.7.egg/avro/io.py", line > 364, in write_crc32 > self.write(STRUCT_CRC32.pack(crc32(bytes))); > struct.error: integer out of range for 'I' format code > {code} > From my investigation, str(crc32(bytes)) is showing negative integers, so the > issue seems to be fixed by masking the output. > This fix appears to work from my limited testing: > {code} > --- io.old.py 2011-09-21 14:32:38.992544680 +1000 > +++ io.py 2011-09-21 14:33:11.492544686 +1000 > @@ -360,7 +360,7 @@ > """ > A 4-byte, big-endian CRC32 checksum > """ > - self.write(STRUCT_CRC32.pack(crc32(bytes))); > + self.write(STRUCT_CRC32.pack(crc32(bytes) & 0xffffffff)); > > # > # DatumReader/Writer > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira