[ 
https://issues.apache.org/jira/browse/AVRO-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026111#comment-14026111
 ] 

John Karp commented on AVRO-1462:
---------------------------------

Sorry, I don't seem to be able to delete faulty postings. What I meant was:

In perl, the distinction between strings and numbers is unusually soft. They're 
both considered 'scalars', and the only way to differentiate is to look at the 
content. And the reason it makes sense to accept 0-9 but not ०-९ etc. is for 
consistency with perl; perl will accept strings using 0-9 as numbers, but 
reject the others:
{code}
% perl -le 'use warnings; print "2"*2'
4
% perl -le 'use warnings; print "२"*2'
Argument "M-`M-%M-*" isn't numeric in multiplication (*) at -e line 1.
0
{code}

For contrast, in both python implementations you're able to do an explicit type 
check:
{code}
return (isinstance(datum, int)
        and (INT_MIN_VALUE <= datum <= INT_MAX_VALUE))
{code}
neither will accept a string consisting of digits, Indic, Arabic or otherwise. 
Same for php and ruby:
{code}
      case self::INT_TYPE:
        return (is_int($datum)
                && (self::INT_MIN_VALUE <= $datum)
                && ($datum <= self::INT_MAX_VALUE));
      case self::LONG_TYPE:
        return (is_int($datum)
                && (self::LONG_MIN_VALUE <= $datum)
                && ($datum <= self::LONG_MAX_VALUE));
{code}
{code}
      when :int
        (datum.is_a?(Fixnum) || datum.is_a?(Bignum)) &&
            (INT_MIN_VALUE <= datum) && (datum <= INT_MAX_VALUE)
      when :long
        (datum.is_a?(Fixnum) || datum.is_a?(Bignum)) &&
            (LONG_MIN_VALUE <= datum) && (datum <= LONG_MAX_VALUE)
{code}

> Non-ASCII decimal characters cause warning from Perl API serializer
> -------------------------------------------------------------------
>
>                 Key: AVRO-1462
>                 URL: https://issues.apache.org/jira/browse/AVRO-1462
>             Project: Avro
>          Issue Type: Bug
>          Components: perl
>            Reporter: John Karp
>            Assignee: John Karp
>            Priority: Minor
>             Fix For: 1.7.7
>
>         Attachments: AVRO-1462.patch
>
>
> The serializer is using the \d metacharacter in a regex to check for decimal 
> characters. However, \d also matches non-ASCII decimals such as those from 
> Hindi or Arabic, and that causes this warning:
> {noformat}
> Argument "\x{661}" isn't numeric in abs at 
> /home/johnkarp/git/avro/lang/perl/blib/lib/Avro/BinaryEncoder.pm line 92.
> {noformat}
> Test case:



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to