[ 
https://issues.apache.org/jira/browse/AVRO-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937960#comment-13937960
 ] 

John Karp commented on AVRO-1470:
---------------------------------

Since everything is evaluable to true or false in Perl, including undef, I 
think that might be problematic. If you had a union of [boolean, null], there's 
no value you can specify to get the null branch. If you have a union of 
[boolean, string], any string specified will be encoded as a true boolean, 
except for the empty string which would be false. And if someone is encoding a 
populated hash or array as a boolean type, there's a good chance its a mistake, 
and it would be more useful to produce an error than to encode the whole thing 
as a true value.

Another approach would be to use '1' and the empty string as true and false, 
since they're the closest thing in Perl to canonical boolean values. ('' == !!0 
and '1' == !!1). However, deserializing false as the empty string might be 
confusing to some users.

Another approach that avoids all the above problems would be to require a 
special class for representing booleans, for example:
http://search.cpan.org/~mattp/Data-Perl-0.001000/lib/Data/Perl/Bool.pm
The downsides are that it would be another dependency, and you couldn't 
directly pass the result of a boolean test into the serializer without having 
to wrap it first.

> Perl API boolean type misencoded
> --------------------------------
>
>                 Key: AVRO-1470
>                 URL: https://issues.apache.org/jira/browse/AVRO-1470
>             Project: Avro
>          Issue Type: Bug
>          Components: perl
>            Reporter: John Karp
>            Assignee: John Karp
>         Attachments: AVRO-1470.patch
>
>
> h1. Boolean Serialization
> The boolean serialization code in BinaryEncoder.pm is:
> {noformat}
> $data ? \0x1 : \0x0
> {noformat}
> intending that anything false to perl, such as 0, '0', '', () and undef are 
> encoded as zero, and everything else is encoded as one. However, this code 
> doesn't work, as these unit tests would indicate:
> {noformat}
> primitive_ok boolean => 0, "\x0";
> primitive_ok boolean => 1, "\x1";
> {noformat}
> which print:
> {noformat}
> #   Failed test 'primitive boolean encoded correctly'
> #   at t/02_bin_encode.t line 40.
> #          got: '30'
> #     expected: '00'
> #   Failed test 'primitive boolean encoded correctly'
> #   at t/02_bin_encode.t line 40.
> #          got: '31'
> #     expected: '01'
> {noformat}
> h1. Booleans in Unions
> Inconsistent with the above serialization, the code used in Schema.pm to 
> determine which union branch to use, is attempting to check for boolean-ness 
> with:
> {noformat}
> m{yes|no|y|n|t|f|true|false}i
> {noformat}
> meaning only those particular strings are considered booleans, however they 
> will all get encoded as '0' by BinaryEncoder.pm.
> I say 'attempts' because its actually matching this regex against the data 
> type name $type, which in this context will always be 'boolean', instead of 
> of the value $data.
> h1. Suggested Fix
> Perl has no boolean type, so there's no ideal solution for the inconsistency. 
> But we could keep it simple, and have only the numbers 0 and 1 accepted as 
> boolean values.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to