Hiten Java created PIG-3668: ------------------------------- Summary: COR built-in function when atleast one of the coefficient values is NaN Key: PIG-3668 URL: https://issues.apache.org/jira/browse/PIG-3668 Project: Pig Issue Type: Bug Components: internal-udfs Affects Versions: 0.11.1, 0.12.0, 0.11 Reporter: Hiten Java Priority: Trivial
When passing multiple column keys for Correlation analysis, if coefficient value of one of the combinations is NaN, then the value for all other combinations is not computed. Pearson Co-efficient value is NaN if all values for a given column are the same. Example: A = LOAD 'myData' USING org.apache.hcatalog.pig.HCatLoader(); B = group A all; c = foreach B generate group, FLATTEN(COR((bag{tuple(double)}) A.col_1,(bag{tuple(double)}) A.col_2, (bag{tuple(double)}) A.col_3, (bag{tuple(double)}) A.col_4)); If the value of pearson coefficient for col_1 and col_2 is NaN, then value of co-efficients for all combinations is NaN This is happening because of 'return null' statement in catch block on lines 157 and 235 in file org.apache.pig.builtin.COR.java If the catch block is removed, then the correlation analysis would continue for the remaining columns. (ApachePig 0.12.0) -- This message was sent by Atlassian JIRA (v6.1.5#6160)