Spencer Williams created AVRO-3480:
--------------------------------------
Summary: Avro files with multiple "blocks" fail to deserialize
from a file when using the DEFLATE codec
Key: AVRO-3480
URL: https://issues.apache.org/jira/browse/AVRO-3480
Project: Apache Avro
Issue Type: Bug
Components: php
Affects Versions: 1.11.0
Reporter: Spencer Williams
Attachments: repro_java_create_problematic_avro_file.zip, test.avro
When attempting in PHP to deserialize a file containing a large number of
records (see example file attached – 20,000 records) that uses the DEFLATE
codec, the `$decoder` instance advances through the file incorrectly,
eventually yielding an empty string that is passed into `gzinflate(...)` on
this line:
[https://github.com/apache/avro/blob/a6f13b269a359d3839e55a75e0662d834d76992c/lang/php/lib/DataFile/AvroDataIOReader.php#L176]
...resulting in a PHP error being raised. Notably, at the time when this
happens, not all records have been deserialized, so it seems that this is
related to there being multiple "blocks" in the file.
I've attached a file that meets this condition, and also a quick Kotlin project
using the official Java library that I used to generate the file.
The PHP code in question to reproduce this behavior is pretty standard, lifted
directly from the provided {{examples/write_read.php}} file:
{{{}<?php{}}}{{{}if (count($argv) < 2) {{}}}
{{ echo "USAGE: php main.php FILENAME";}}
{{ exit(1);}}
{{}}}
{{$filename = $argv[1];}}
{{require_once __DIR__ . '/../vendor/avro-php-1.11.0/lib/autoload.php';}}
{{use Apache\Avro\DataFile\AvroDataIO;}}
{{$data_reader = AvroDataIO::openFile($filename);}}
{{echo "Reading from $filename:\n";}}
{{foreach ($data_reader->data() as $datum) {}}
{{ echo var_export($datum, true) . "\n";}}
{{}}}
{{$data_reader->close();}}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)