[ 
https://issues.apache.org/jira/browse/PIG-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4265:
----------------------------------
    Description: 
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig

a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'local.output/RubyUDFs_10_benchmark.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck 266     15.0

#grep "david s" studenttab10k
david steinbeck 21      2.44
david steinbeck 33      1.17
david steinbeck 42      1.94
david steinbeck 42      1.35
david steinbeck 31      2.77
david steinbeck 40      2.42
david steinbeck 57      3.91


when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and 
(double)((int)$2*100)/100 will be "david steinbeck       266     16.0".
when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 
15.999999999999998 and (double)((int)$2*100)/100 will be "david steinbeck       
266     15.0".

I don't know why the same code by different execution engines(spark and 
mapreduce) on the same os returns different results. 



  was:
$PIG_HOME/bin/pig -x local RubyUDFs_10.pig
#RubyUDFs_10.pig

a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
b = group a by name;
c = foreach b generate group, SUM(a.age), SUM(a.gpa);
d = foreach c generate $0, $1, (double)((int)$2*100)/100;
store d into 'local.output/RubyUDFs_10_benchmark.out';

the result in RubyUDFs_10.out/part
#grep "david s" RubyUDFs_10.out/part-r-00000 
david steinbeck 266     15.0

#grep "david s" studenttab10k
david steinbeck 21      2.44
david steinbeck 33      1.17
david steinbeck 42      1.94
david steinbeck 42      1.35
david steinbeck 31      2.77
david steinbeck 40      2.42
david steinbeck 57      3.91


when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and 
(double)((int)$2*100)/100 will be "david steinbeck       266     16.0".
when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 
15.999999999999998 and (double)((int)$2*100)/100 will be "david steinbeck       
266     15.0".




> AlgebraicDoubleMathBase has "Java double precision problems"
> ------------------------------------------------------------
>
>                 Key: PIG-4265
>                 URL: https://issues.apache.org/jira/browse/PIG-4265
>             Project: Pig
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>
> $PIG_HOME/bin/pig -x local RubyUDFs_10.pig
> #RubyUDFs_10.pig
> a = load 'studenttab10k' using PigStorage() as (name, age:int, gpa:double);
> b = group a by name;
> c = foreach b generate group, SUM(a.age), SUM(a.gpa);
> d = foreach c generate $0, $1, (double)((int)$2*100)/100;
> store d into 'local.output/RubyUDFs_10_benchmark.out';
> the result in RubyUDFs_10.out/part
> #grep "david s" RubyUDFs_10.out/part-r-00000 
> david steinbeck       266     15.0
> #grep "david s" studenttab10k
> david steinbeck       21      2.44
> david steinbeck       33      1.17
> david steinbeck       42      1.94
> david steinbeck       42      1.35
> david steinbeck       31      2.77
> david steinbeck       40      2.42
> david steinbeck       57      3.91
> when runing Ruby_UDFs.pig in spark, the sum(a.gpa) is 16.0 and 
> (double)((int)$2*100)/100 will be "david steinbeck     266     16.0".
> when running Ruby_UDFs.pig in mapreduce mode, the sum(a.gpa) is 
> 15.999999999999998 and (double)((int)$2*100)/100 will be "david steinbeck     
> 266     15.0".
> I don't know why the same code by different execution engines(spark and 
> mapreduce) on the same os returns different results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to