[
https://issues.apache.org/jira/browse/AVRO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945751#comment-17945751
]
ASF subversion and git services commented on AVRO-4090:
-------------------------------------------------------
Commit d5d5466d8d8a36fcbecbc924515174638f7ad515 in avro's branch
refs/heads/main from Thiago Romão Barcala
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=d5d5466d8d ]
AVRO-4090: Avoid repeating data validation (#3241)
> PHP data is validated multiple times for nested schemas
> -------------------------------------------------------
>
> Key: AVRO-4090
> URL: https://issues.apache.org/jira/browse/AVRO-4090
> Project: Apache Avro
> Issue Type: Improvement
> Reporter: Thiago Romão Barcala
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Consider the test script below:
> {code:php}
> <?php
> use Apache\Avro\Datum\AvroIOBinaryEncoder;
> use Apache\Avro\Datum\AvroIODatumWriter;
> use Apache\Avro\IO\AvroStringIO;
> use Apache\Avro\Schema\AvroSchema;
> require_once 'vendor/autoload.php';
> $writer = new AvroIODatumWriter();
> $schemaJson = <<<'JSON'
> {
> "type": "record",
> "name": "A",
> "fields": [
> {
> "name": "a",
> "type": {
> "type": "record",
> "name": "B",
> "fields": [
> {
> "name": "b",
> "type": {
> "type": "record",
> "name": "C",
> "fields": [
> {
> "name": "c",
> "type": {
> "type": "record",
> "name": "D",
> "fields": [
> {
> "name": "d",
> "type": {
> "type": "record",
> "name": "E",
> "fields": [
> {
> "name": "e",
> "type":
> "string"
> }
> ]
> }
> }
> ]
> }
> }
> ]
> }
> }
> ]
> }
> }
> ]
> }
> JSON
> ;
> $data = ['a' => ['b' => ['c' => ['d' => ['e' => 'value']]]]];
> $schema = AvroSchema::parse($schemaJson);
> $io = new AvroStringIO();
> $writer->writeData($schema, $data, new AvroIOBinaryEncoder($io));
> var_dump($io->__toString()); {code}
> By running the script above with the command line below, it is possible to
> see, by inspecting the profiler output, that the method
> AvroSchema::isValidDatum is called 21 times:
> {code:bash}
> php -dxdebug.start_with_request=true -dxdebug.mode=profile
> -dxdebug.output_dir=$(pwd) test.php
> {code}
> The validation should be called only 6 times though, once for each record,
> and once for the string value. This is happening, because writeData is being
> called for every field of the record, and writeData validates the entire data
> graph.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)