Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/206#discussion_r154422562
  
    --- Diff: src/ports/postgres/modules/stats/correlation.sql_in ---
    @@ -207,8 +204,17 @@ Result:
     </pre>
     
     @par Notes
    -Current implementation ignores a row that contains NULL entirely. This 
means
    -any correlation in such a row (with NULLs) does not contribute to the 
final answer.
    +
    +Null values will be replaced by the mean of their respective columns (Mean 
imputation/substitution). Mean imputation is a method in which the missing 
value on a certain variable is replaced by the mean of the available cases. 
This method maintains the sample size and is easy to use, but the variability 
in the data is reduced, so the standard deviations and the variance estimates 
tend to be underestimated. Please refer to [1] and [2] for details.
    --- End diff --
    
    Need to wrap this to within 80 char length


---

Reply via email to