Spencer Williams created AVRO-3480:
--------------------------------------

             Summary: Avro files with multiple "blocks" fail to deserialize 
from a file when using the DEFLATE codec
                 Key: AVRO-3480
                 URL: https://issues.apache.org/jira/browse/AVRO-3480
             Project: Apache Avro
          Issue Type: Bug
          Components: php
    Affects Versions: 1.11.0
            Reporter: Spencer Williams
         Attachments: repro_java_create_problematic_avro_file.zip, test.avro

When attempting in PHP to deserialize a file containing a large number of 
records (see example file attached – 20,000 records) that uses the DEFLATE 
codec, the `$decoder` instance advances through the file incorrectly, 
eventually yielding an empty string that is passed into `gzinflate(...)` on 
this line: 
[https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176]

 

...resulting in a PHP error being raised. Notably, at the time when this 
happens, not all records have been deserialized, so it seems that this is 
related to there being multiple "blocks" in the file.

I've attached a file that meets this condition, and also a quick Kotlin project 
using the official Java library that I used to generate the file.

The PHP code in question to reproduce this behavior is pretty standard, lifted 
directly from the provided {{examples/write_read.php}} file:

 

{{{}<?php{}}}{{{}if (count($argv) < 2) {{}}}
{{    echo "USAGE: php main.php FILENAME";}}
{{    exit(1);}}
{{}}}
{{$filename = $argv[1];}}

{{require_once __DIR__ . '/../vendor/avro-php-1.11.0/lib/autoload.php';}}

{{use Apache\Avro\DataFile\AvroDataIO;}}

{{$data_reader = AvroDataIO::openFile($filename);}}
{{echo "Reading from $filename:\n";}}
{{foreach ($data_reader->data() as $datum) {}}
{{    echo var_export($datum, true) . "\n";}}
{{}}}
{{$data_reader->close();}}

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to