[ 
https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4265:
----------------------------------
    Description: 
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig

a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'local.output/RubyUDFs_10_benchmark.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck 266     15.0

#grep "david s" studenttab10k
david steinbeck 21      2.44
david steinbeck 33      1.17
david steinbeck 42      1.94
david steinbeck 42      1.35
david steinbeck 31      2.77
david steinbeck 40      2.42
david steinbeck 57      3.91

when you sum all the gpa of "david steinbeck" in the file "studenttab10k", the 
result is "16" while the result in RubyUDFs_10.out/part-r-00000 is "15". The 
reason is because double precision problem in AlgebraicDoubleMathBase.java.
It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 
15.0.

{code}
AlgebraicDoubleMathBase.java
    private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
        if (arg1 == null) {
            return arg2;
        } else if (arg2 == null) {
            return arg1;
        } else {
            switch (op) {
            case MAX: return Math.max(arg1, arg2);
            case MIN: return Math.min(arg1, arg2);
            case SUM: return arg1+arg2;  //this line has "Java BigDecimal 
precision problem"
            default: return null;
            }
        }
    }
{code}
The detail Java double precision problem you can refer 
"https://community.oracle.com/thread/2448849?tstart=0";



  was:
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig
register '/home/zly/prj/oss/pig/bin/libexec/ruby/scriptingudfs.rb' using jruby 
as myfuncs;
a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, myfuncs.Sum(a.age), myfuncs.Sum(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'RubyUDFs_10.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck 266     15.0

#grep "david s" studenttab10k
david steinbeck 21      2.44
david steinbeck 33      1.17
david steinbeck 42      1.94
david steinbeck 42      1.35
david steinbeck 31      2.77
david steinbeck 40      2.42
david steinbeck 57      3.91

when you sum all the gpa of "david steinbeck" in the file "studenttab10k", the 
result is "16" while the result in RubyUDFs_10.out/part-r-00000 is "15". The 
reason is because double precision problem in AlgebraicDoubleMathBase.java.
It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 
15.0.

{code}
AlgebraicDoubleMathBase.java
    private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
        if (arg1 == null) {
            return arg2;
        } else if (arg2 == null) {
            return arg1;
        } else {
            switch (op) {
            case MAX: return Math.max(arg1, arg2);
            case MIN: return Math.min(arg1, arg2);
            case SUM: return arg1+arg2;  //this line has "Java BigDecimal 
precision problem"
            default: return null;
            }
        }
    }
{code}
The detail Java double precision problem you can refer 
"https://community.oracle.com/thread/2448849?tstart=0";




> AlgebraicDoubleMathBase has "Java double precision problems"
> ------------------------------------------------------------
>
>                 Key: PIG-4265
>                 URL: https://issues.apache.org/jira/browse/PIG-4265
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>         Attachments: PIG-4265.patch
>
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age), SUM(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'local.output/RubyUDFs_10_benchmark.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000 
> david steinbeck       266     15.0
> #grep "david s" studenttab10k
> david steinbeck       21      2.44
> david steinbeck       33      1.17
> david steinbeck       42      1.94
> david steinbeck       42      1.35
> david steinbeck       31      2.77
> david steinbeck       40      2.42
> david steinbeck       57      3.91
> when you sum all the gpa of "david steinbeck" in the file "studenttab10k", 
> the result is "16" while the result in RubyUDFs_10.out/part-r-00000 is "15". 
> The reason is because double precision problem in 
> AlgebraicDoubleMathBase.java.
> It sums all the gpa numbers to 15.999999-(double)((int)15.999999*100)/100 = 
> 15.0.
> {code}
> AlgebraicDoubleMathBase.java
>     private static Double doWork(Double arg1, Double arg2, KNOWN_OP op) {
>         if (arg1 == null) {
>             return arg2;
>         } else if (arg2 == null) {
>             return arg1;
>         } else {
>             switch (op) {
>             case MAX: return Math.max(arg1, arg2);
>             case MIN: return Math.min(arg1, arg2);
>             case SUM: return arg1+arg2;  //this line has "Java BigDecimal 
> precision problem"
>             default: return null;
>             }
>         }
>     }
> {code}
> The detail Java double precision problem you can refer 
> "https://community.oracle.com/thread/2448849?tstart=0";



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to