[
https://issues.apache.org/jira/browse/AVRO-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937960#comment-13937960
]
John Karp commented on AVRO-1470:
---------------------------------
Since everything is evaluable to true or false in Perl, including undef, I
think that might be problematic. If you had a union of [boolean, null], there's
no value you can specify to get the null branch. If you have a union of
[boolean, string], any string specified will be encoded as a true boolean,
except for the empty string which would be false. And if someone is encoding a
populated hash or array as a boolean type, there's a good chance its a mistake,
and it would be more useful to produce an error than to encode the whole thing
as a true value.
Another approach would be to use '1' and the empty string as true and false,
since they're the closest thing in Perl to canonical boolean values. ('' == !!0
and '1' == !!1). However, deserializing false as the empty string might be
confusing to some users.
Another approach that avoids all the above problems would be to require a
special class for representing booleans, for example:
http://search.cpan.org/~mattp/Data-Perl-0.001000/lib/Data/Perl/Bool.pm
The downsides are that it would be another dependency, and you couldn't
directly pass the result of a boolean test into the serializer without having
to wrap it first.
> Perl API boolean type misencoded
> --------------------------------
>
> Key: AVRO-1470
> URL: https://issues.apache.org/jira/browse/AVRO-1470
> Project: Avro
> Issue Type: Bug
> Components: perl
> Reporter: John Karp
> Assignee: John Karp
> Attachments: AVRO-1470.patch
>
>
> h1. Boolean Serialization
> The boolean serialization code in BinaryEncoder.pm is:
> {noformat}
> $data ? \0x1 : \0x0
> {noformat}
> intending that anything false to perl, such as 0, '0', '', () and undef are
> encoded as zero, and everything else is encoded as one. However, this code
> doesn't work, as these unit tests would indicate:
> {noformat}
> primitive_ok boolean => 0, "\x0";
> primitive_ok boolean => 1, "\x1";
> {noformat}
> which print:
> {noformat}
> # Failed test 'primitive boolean encoded correctly'
> # at t/02_bin_encode.t line 40.
> # got: '30'
> # expected: '00'
> # Failed test 'primitive boolean encoded correctly'
> # at t/02_bin_encode.t line 40.
> # got: '31'
> # expected: '01'
> {noformat}
> h1. Booleans in Unions
> Inconsistent with the above serialization, the code used in Schema.pm to
> determine which union branch to use, is attempting to check for boolean-ness
> with:
> {noformat}
> m{yes|no|y|n|t|f|true|false}i
> {noformat}
> meaning only those particular strings are considered booleans, however they
> will all get encoded as '0' by BinaryEncoder.pm.
> I say 'attempts' because its actually matching this regex against the data
> type name $type, which in this context will always be 'boolean', instead of
> of the value $data.
> h1. Suggested Fix
> Perl has no boolean type, so there's no ideal solution for the inconsistency.
> But we could keep it simple, and have only the numbers 0 and 1 accepted as
> boolean values.
--
This message was sent by Atlassian JIRA
(v6.2#6252)