[ https://issues.apache.org/jira/browse/PIG-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hiten Java updated PIG-3668: ---------------------------- Attachment: CORR.diff Patch file attached. Removed return statements from catch block. > COR built-in function when atleast one of the coefficient values is NaN > ----------------------------------------------------------------------- > > Key: PIG-3668 > URL: https://issues.apache.org/jira/browse/PIG-3668 > Project: Pig > Issue Type: Bug > Components: internal-udfs > Affects Versions: 0.12.0 > Reporter: Hiten Java > Attachments: CORR.diff > > > When passing multiple column keys for Correlation analysis, if coefficient > value of one of the combinations is NaN, then the value for all other > combinations is not computed. > Pearson Co-efficient value is NaN if all values for a given column are the > same. > Example: > A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); > B = group A all; > c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) > A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, > (bag{tuple(double)}) A.col_4)); > If the value of pearson coefficient for col_1 and col_2 is NaN, then value of > co-efficients for all combinations is NaN > This is happening because of 'return null' statement in catch block on lines > 157 and 235 in file org.apache.pig.builtin.COR.java > If the catch block is removed, then the correlation analysis would continue > for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)